Project description:BACKGROUND:Identification of protein-protein interactions is an important first step to understand living systems. High-throughput experimental approaches have accumulated large amount of information on protein-protein interactions in human and other model organisms. Such interaction information has been successfully transferred to other species, in which the experimental data are limited. However, the annotation transfer method could yield false positive interologs due to the lack of conservation of interactions when applied to phylogenetically distant organisms. RESULTS:To address this issue, we used phylogenetic profile method to filter false positives in interologs based on the notion that evolutionary conserved interactions show similar patterns of occurrence along the genomes. The approach was applied to Mus musculus, in which the experimentally identified interactions are limited. We first inferred the protein-protein interactions in Mus musculus by using two approaches: i) identifying mouse orthologs of interacting proteins (interologs) based on the experimental protein-protein interaction data from other organisms; and ii) analyzing frequency of mouse ortholog co-occurrence in predicted operons of bacteria. We then filtered possible false-positives in the predicted interactions using the phylogenetic profiles. We found that this filtering method significantly increased the frequency of interacting protein-pairs coexpressed in the same cells/tissues in gene expression omnibus (GEO) database as well as the frequency of interacting protein-pairs shared the similar Gene Ontology (GO) terms for biological processes and cellular localizations. The data supports the notion that phylogenetic profile helps to reduce the number of false positives in interologs. CONCLUSION:We have developed protein-protein interaction database in mouse, which contains 41109 interologs. We have also developed a web interface to facilitate the use of database http://lgsun.grc.nia.nih.gov/mppi/.
Project description:Here we report the expansion of the genetic code of Mus musculus with various unnatural amino acids including N?-acetyl-lysine. Stable integration of transgenes encoding an engineered N?-acetyl-lysyl-tRNA synthetase (AcKRS)/tRNAPyl pair into the mouse genome enables site-specific incorporation of unnatural amino acids into a target protein in response to the amber codon. We demonstrate temporal and spatial control of protein acetylation in various organs of the transgenic mouse using a recombinant green fluorescent protein (GFPuv) as a model protein. This strategy will provide a powerful tool for systematic in vivo study of cellular proteins in the most commonly used mammalian model organism for human physiology and disease.
Project description:Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation.Here a method previously used to predict Gene Ontology (GO) terms for Saccharomyces cerevisiae (Tian et al.: Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function. Genome Biol 2008, 9(Suppl 1):S7) is applied to predict GO terms and phenotypes for 21,603 Mus musculus genes, using a diverse collection of integrated data sources (including expression, interaction, and sequence-based data). This combined 'guilt-by-profiling' and 'guilt-by-association' approach optimizes the combination of two inference methodologies. Predictions at all levels of confidence are evaluated by examining genes not used in training, and top predictions are examined manually using available literature and knowledge base resources.We assigned a confidence score to each gene/term combination. The results provided high prediction performance, with nearly every GO term achieving greater than 40% precision at 1% recall. Among the 36 novel predictions for GO terms and 40 for phenotypes that were studied manually, >80% and >40%, respectively, were identified as accurate. We also illustrate that a combination of 'guilt-by-profiling' and 'guilt-by-association' outperforms either approach alone in their application to M. musculus.
Project description:BACKGROUND: Long terminal repeat (LTR) retrotransposons make up a large fraction of the typical mammalian genome. They comprise about 8% of the human genome and approximately 10% of the mouse genome. On account of their abundance, LTR retrotransposons are believed to hold major significance for genome structure and function. Recent advances in genome sequencing of a variety of model organisms has provided an unprecedented opportunity to evaluate better the diversity of LTR retrotransposons resident in eukaryotic genomes. RESULTS: Using a new data-mining program, LTR_STRUC, in conjunction with conventional techniques, we have mined the GenBank mouse (Mus musculus) database and the more complete Ensembl mouse dataset for LTR retrotransposons. We report here that the M. musculus genome contains at least 21 separate families of LTR retrotransposons; 13 of these families are described here for the first time. CONCLUSIONS: All families of mouse LTR retrotransposons are members of the gypsy-like superfamily of retroviral-like elements. Several different families of unrelated non-autonomous elements were identified, suggesting that the evolution of non-autonomy may be a common event. High sequence similarity between several LTR retrotransposons identified in this study and those found in distantly-related species suggests that horizontal transfer has been a significant factor in the evolution of mouse LTR retrotransposons.
Project description:Copy number variation is an important dimension of genetic diversity and has implications in development and disease. As an important model organism, the mouse is a prime candidate for copy number variant (CNV) characterization, but this has yet to be completed for a large sample size. Here we report CNV analysis of publicly available, high-density microarray data files for 351 mouse tail samples, including 290 mice that had not been characterized for CNVs previously.We found 9634 putative autosomal CNVs across the samples affecting 6.87% of the mouse reference genome. We find significant differences in the degree of CNV uniqueness (single sample occurrence) and the nature of CNV-gene overlap between wild-caught mice and classical laboratory strains. CNV-gene overlap was associated with lipid metabolism, pheromone response and olfaction compared to immunity, carbohydrate metabolism and amino-acid metabolism for wild-caught mice and classical laboratory strains, respectively. Using two subspecies of wild-caught Mus musculus, we identified putative CNVs unique to those subspecies and show this diversity is better captured by wild-derived laboratory strains than by the classical laboratory strains. A total of 9 genic copy number variable regions (CNVRs) were selected for experimental confirmation by droplet digital PCR (ddPCR).The analysis we present is a comprehensive, genome-wide analysis of CNVs in Mus musculus, which increases the number of known variants in the species and will accelerate the identification of novel variants in future studies.
Project description:A transcriptome study in mouse hematopoietic stem cells was performed using a sensitive SAGE method, in an attempt to detect medium and low abundant transcripts expressed in these cells. Among a total of 31,380 unique transcript, 17,326 (55%) known genes were detected, 14,054 (45%) low-copy transcripts that have no matches to currently known genes. 3,899 (23%) were alternatively spliced transcripts of the known genes and 3,754 (22%) represent anti-sense transcripts from known genes. Overall design: Mouse hematopoietic stem cells were purified from bone marrow cells using negative and positive selection with a Magnetic-Activated Cell Sorter (MACS). total RNA and mRNA were purified from the purified cells using Trizol reagent and magnetic oligo dT beads. Double strand cDNAs were synthesized using a cDNA synthesis kit and anchored oligo dT primers. After NlaIII digestion, 3’ cDNAs were isolated and amplified through 16-cycle PCR. SAGE tags were released from the 3’ cDNA after linker ligation. Ditags were formed, concatemerized and cloned into a pZERO vector. Sequencing reactions were performed with the ET sequencing terminator kit. Sequences were collected using a Megabase 1000 sequencer. SAGE tag sequences were extracted using SAGE 2000 software.
Project description:Rodent betaherpesviruses vary considerably in genomic content, and these variations can result in a distinct pathogenicity. Therefore, the identification of unknown betaherpesviruses in house mice (Mus musculus), the most important rodent host species in basic research, is of importance. During a search for novel herpesviruses in house mice using herpesvirus consensus PCR and attempts to isolate viruses in tissue culture, we identified a previously unknown betaherpesvirus. The primary PCR search in mouse organs revealed the presence of known strains of murine cytomegalovirus (Murid herpesvirus 1) and of Mus musculus rhadinovirus 1 only. However, the novel virus was detected after incubation of organ pieces in fibroblast tissue culture and subsequent PCR analysis of the supernatants. Long-distance PCR amplification including the DNA polymerase and glycoprotein B genes revealed a 3.4 kb sequence that was similar to sequences of rodent cytomegaloviruses. Pairwise sequence comparisons and phylogenetic analyses showed that this newly identified murine virus is most similar to the English isolate of rat cytomegalovirus, thereby raising the possibility that two distinct CMV lineages have evolved in both Mus musculus and Rattus norvegicus.