Dataset Information

An en masse phenotype and function prediction system for Mus musculus.

ABSTRACT:

Background

Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation.

Results

Here a method previously used to predict Gene Ontology (GO) terms for Saccharomyces cerevisiae (Tian et al.: Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function. Genome Biol 2008, 9(Suppl 1):S7) is applied to predict GO terms and phenotypes for 21,603 Mus musculus genes, using a diverse collection of integrated data sources (including expression, interaction, and sequence-based data). This combined 'guilt-by-profiling' and 'guilt-by-association' approach optimizes the combination of two inference methodologies. Predictions at all levels of confidence are evaluated by examining genes not used in training, and top predictions are examined manually using available literature and knowledge base resources.

Conclusion

We assigned a confidence score to each gene/term combination. The results provided high prediction performance, with nearly every GO term achieving greater than 40% precision at 1% recall. Among the 36 novel predictions for GO terms and 40 for phenotypes that were studied manually, >80% and >40%, respectively, were identified as accurate. We also illustrate that a combination of 'guilt-by-profiling' and 'guilt-by-association' outperforms either approach alone in their application to M. musculus.

SUBMITTER: Tasan M

PROVIDER: S-EPMC2447542 | biostudies-literature | 2008

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

An en masse phenotype and function prediction system for Mus musculus.

Taşan Murat M Tian Weidong W Hill David P DP Gibbons Francis D FD Blake Judith A JA Roth Frederick P FP

Genome biology 20080627

<h4>Background</h4>Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation.<h4>Results</h4>Here a method previously used to predict Gene Ontology (GO) terms for Saccharomyces cerevisiae (Tian et al.: Combining guilt-by- ...[more]

PMID: 18613952

Similar Datasets

Project description:BACKGROUND:Identification of protein-protein interactions is an important first step to understand living systems. High-throughput experimental approaches have accumulated large amount of information on protein-protein interactions in human and other model organisms. Such interaction information has been successfully transferred to other species, in which the experimental data are limited. However, the annotation transfer method could yield false positive interologs due to the lack of conservation of interactions when applied to phylogenetically distant organisms. RESULTS:To address this issue, we used phylogenetic profile method to filter false positives in interologs based on the notion that evolutionary conserved interactions show similar patterns of occurrence along the genomes. The approach was applied to Mus musculus, in which the experimentally identified interactions are limited. We first inferred the protein-protein interactions in Mus musculus by using two approaches: i) identifying mouse orthologs of interacting proteins (interologs) based on the experimental protein-protein interaction data from other organisms; and ii) analyzing frequency of mouse ortholog co-occurrence in predicted operons of bacteria. We then filtered possible false-positives in the predicted interactions using the phylogenetic profiles. We found that this filtering method significantly increased the frequency of interacting protein-pairs coexpressed in the same cells/tissues in gene expression omnibus (GEO) database as well as the frequency of interacting protein-pairs shared the similar Gene Ontology (GO) terms for biological processes and cellular localizations. The data supports the notion that phylogenetic profile helps to reduce the number of false positives in interologs. CONCLUSION:We have developed protein-protein interaction database in mouse, which contains 41109 interologs. We have also developed a web interface to facilitate the use of database http://lgsun.grc.nia.nih.gov/mppi/.

Project description:BackgroundThe mammalian vomeronasal organ (VNO) expresses two G-protein coupled receptor gene families that mediate pheromone responses, the V1R and V2R receptor genes. In rodents, there are ~150 V1R genes comprising 12 subfamilies organized in gene clusters at multiple chromosomal locations. Previously, we showed that several of these subfamilies had been extensively modulated by gene duplications, deletions, and gene conversions around the time of the evolutionary split of the mouse and rat lineages, consistent with the hypothesis that V1R repertoires might be involved in reinforcing speciation events. Here, we generated genome sequence for one large cluster containing two V1R subfamilies in Mus spretus, a closely related and sympatric species to Mus musculus, and investigated evolutionary change in these repertoires along the two mouse lineages.ResultsWe describe a comparison of spretus and musculus with respect to genome organization and synteny, as well as V1R gene content and phylogeny, with reference to previous observations made between mouse and rat. Unlike the mouse-rat comparisons, synteny seems to be largely conserved between the two mouse species. Disruption of local synteny is generally associated with differences in repeat content, although these differences appear to arise more from deletion than new integrations. Even though unambiguous V1R orthology is evident, we observe dynamic modulation of the functional repertoires, with two of seven V1Rb and one of eleven V1Ra genes lost in spretus, two V1Ra genes becoming pseudogenes in musculus, two additional orthologous pairs apparently subject to strong adaptive selection, and another divergent orthologous pair that apparently was subjected to gene conversion.ConclusionTherefore, eight of the 18 (~44%) presumptive V1Ra/V1Rb genes in the musculus-spretus ancestor appear to have undergone functional modulation since these two species diverged. As compared to the rat-mouse split, where modulation is evident by independent expansions of these two V1R subfamilies, divergence between musculus and spretus has arisen more by mutations within coding sequences. These results support the hypothesis that adaptive changes in functional V1R repertoires contribute to the delineation of very closely related species.

Dataset Information

An en masse phenotype and function prediction system for Mus musculus.

Background

Results

Conclusion

Publications

An en masse phenotype and function prediction system for Mus musculus.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets