Improving the Gene Ontology Resource to Facilitate More Informative Analysis and Interpretation of Alzheimer's Disease Data.
ABSTRACT: The analysis and interpretation of high-throughput datasets relies on access to high-quality bioinformatics resources, as well as processing pipelines and analysis tools. Gene Ontology (GO, geneontology.org) is a major resource for gene enrichment analysis. The aim of this project, funded by the Alzheimer's Research United Kingdom (ARUK) foundation and led by the University College London (UCL) biocuration team, was to enhance the GO resource by developing new neurological GO terms, and use GO terms to annotate gene products associated with dementia. Specifically, proteins and protein complexes relevant to processes involving amyloid-beta and tau have been annotated and the resulting annotations are denoted in GO databases as 'ARUK-UCL'. Biological knowledge presented in the scientific literature was captured through the association of GO terms with dementia-relevant protein records; GO itself was revised, and new GO terms were added. This literature biocuration increased the number of Alzheimer's-relevant gene products that were being associated with neurological GO terms, such as 'amyloid-beta clearance' or 'learning or memory', as well as neuronal structures and their compartments. Of the total 2055 annotations that we contributed for the prioritised gene products, 526 have associated proteins and complexes with neurological GO terms. To ensure that these descriptive annotations could be provided for Alzheimer's-relevant gene products, over 70 new GO terms were created. Here, we describe how the improvements in ontology development and biocuration resulting from this initiative can benefit the scientific community and enhance the interpretation of dementia data.
Project description:The Gene Ontology (GO) is widely recognised as the gold standard bioinformatics resource for summarizing functional knowledge of gene products in a consistent and computable, information-rich language. GO describes cellular and organismal processes across all species, yet until now there has been a considerable gene annotation deficit within the neurological and immunological domains, both of which are relevant to Parkinson's disease. Here we introduce the Parkinson's disease GO Annotation Project, funded by Parkinson's UK and supported by the GO Consortium, which is addressing this deficit by providing GO annotation to Parkinson's-relevant human gene products, principally through expert literature curation. We discuss the steps taken to prioritise proteins, publications and cellular processes for annotation, examples of how GO annotations capture Parkinson's-relevant information, and the advantages that a topic-focused annotation approach offers to users. Building on the existing GO resource, this project collates a vast amount of Parkinson's-relevant literature into a set of high-quality annotations to be utilized by the research community.
Project description:To address the lack of standard terminology to describe extracellular RNA (exRNA) data/metadata, we have launched an inter-community effort to extend the Gene Ontology (GO) with subcellular structure concepts relevant to the exRNA domain. By extending GO in this manner, the exRNA data/metadata will be more easily annotated and queried because it will be based on a shared set of terms and relationships relevant to extracellular research.By following a consensus-building process, we have worked with several academic societies/consortia, including ERCC, ISEV, and ASEMV, to identify and approve a set of exRNA and extracellular vesicle-related terms and relationships that have been incorporated into GO. In addition, we have initiated an ongoing process of extractions of gene product annotations associated with these terms from Vesiclepedia and ExoCarta, conversion of the extracted annotations to Gene Association File (GAF) format for batch submission to GO, and curation of the submitted annotations by the GO Consortium. As a use case, we have incorporated some of the GO terms into annotations of samples from the exRNA Atlas and implemented a faceted search interface based on such annotations.We have added 7 new terms and modified 9 existing terms (along with their synonyms and relationships) to GO. Additionally, 18,695 unique coding gene products (mRNAs and proteins) and 963 unique non-coding gene products (ncRNAs) which are associated with the terms: "extracellular vesicle", "extracellular exosome", "apoptotic body", and "microvesicle" were extracted from ExoCarta and Vesiclepedia. These annotations are currently being processed for submission to GO.As an inter-community effort, we have made a substantial update to GO in the exRNA context. We have also demonstrated the utility of some of the new GO terms for sample annotation and metadata search.
Project description:The set of annotations at the Saccharomyces Genome Database (SGD) that classifies the cellular function of S. cerevisiae gene products using Gene Ontology (GO) terms has become an important resource for facilitating experimental analysis. In addition to capturing and summarizing experimental results, the structured nature of GO annotations allows for functional comparison across organisms as well as propagation of functional predictions between related gene products. Due to their relevance to many areas of research, ensuring the accuracy and quality of these annotations is a priority at SGD. GO annotations are assigned either manually, by biocurators extracting experimental evidence from the scientific literature, or through automated methods that leverage computational algorithms to predict functional information. Here, we discuss the relationship between literature-based and computationally predicted GO annotations in SGD and extend a strategy whereby comparison of these two types of annotation identifies genes whose annotations need review. Our method, CvManGO (Computational versus Manual GO annotations), pairs literature-based GO annotations with computational GO predictions and evaluates the relationship of the two terms within GO, looking for instances of discrepancy. We found that this method will identify genes that require annotation updates, taking an important step towards finding ways to prioritize literature review. Additionally, we explored factors that may influence the effectiveness of CvManGO in identifying relevant gene targets to find in particular those genes that are missing literature-supported annotations, but our survey found that there are no immediately identifiable criteria by which one could enrich for these under-annotated genes. Finally, we discuss possible ways to improve this strategy, and the applicability of this method to other projects that use the GO for curation. DATABASE URL: http://www.yeastgenome.org.
Project description:BACKGROUND:Gene Ontology (GO) is a major bioinformatic resource used for analysis of large biomedical datasets, for example from genome-wide association studies, applied universally across biological fields, including Alzheimer's disease (AD) research. OBJECTIVE:We aim to demonstrate the applicability of GO for interpretation of AD datasets to improve the understanding of the underlying molecular disease mechanisms, including the involvement of inflammatory pathways and dysregulated microRNAs (miRs). METHODS:We have undertaken a systematic full article GO annotation approach focused on microglial proteins implicated in AD and the miRs regulating their expression. PANTHER was used for enrichment analysis of previously published AD data. Cytoscape was used for visualizing and analyzing miR-target interactions captured from published experimental evidence. RESULTS:We contributed 3,084 new annotations for 494 entities, i.e., on average six new annotations per entity. This included a total of 1,352 annotations for 40 prioritized microglial proteins implicated in AD and 66 miRs regulating their expression, yielding an average of twelve annotations per prioritized entity. The updated GO resource was then used to re-analyze previously published data. The re-analysis showed novel processes associated with AD-related genes, not identified in the original study, such as 'gliogenesis', 'regulation of neuron projection development', or 'response to cytokine', demonstrating enhanced applicability of GO for neuroscience research. CONCLUSIONS:This study highlights ongoing development of the neurobiological aspects of GO and demonstrates the value of biocuration activities in the area, thus helping to delineate the molecular bases of AD to aid the development of diagnostic tools and treatments.
Project description:Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes probably reflects errors in literature curation, ontology structure or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g. amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 52 700 automatically propagated annotations across all taxa.
Project description:BACKGROUND: A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO) as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations). Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms). Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. RESULTS: We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO). DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. CONCLUSION: We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete documentation and software are freely available for download from the website http://biocreative.ifsm.umbc.edu/dyngo.
Project description:Our growing knowledge of viruses reveals how these pathogens manage to evade innate host defenses. A global scheme emerges in which many viruses usurp key cellular defense mechanisms and often inhibit the same components of antiviral signaling. To accurately describe these processes, we have generated a comprehensive dictionary for eukaryotic host-virus interactions. This controlled vocabulary has been detailed in 57 ViralZone resource web pages which contain a global description of all molecular processes. In order to annotate viral gene products with this vocabulary, an ontology has been built in a hierarchy of UniProt Knowledgebase (UniProtKB) keyword terms and corresponding Gene Ontology (GO) terms have been developed in parallel. The results are 65 UniProtKB keywords related to 57 GO terms, which have been used in 14,390 manual annotations; 908,723 automatic annotations and propagated to an estimation of 922,941 GO annotations. ViralZone pages, UniProtKB keywords and GO terms provide complementary tools to users, and the three resources have been linked to each other through host-virus vocabulary.
Project description:BACKGROUND: The Gene Ontology (GO) is used to describe genes and gene products from many organisms. When used for functional annotation of microarray data, GO is often slimmed by editing so that only higher level terms remain. This practice is designed to improve the summarizing of experimental results by grouping high level terms and the statistical power of GO term enrichment analysis. Here, we propose a new approach to editing the gene ontology, clipping, which is the editing of GO according to biological relevance. Creation of a GO subset by clipping is achieved by removing terms (from all hierarchal levels) if they are not functionally relevant to a given domain of interest. Terms that are located in levels higher to relevant terms are kept, thus, biologically irrelevant terms are only removed if they are not parental to terms that are relevant. RESULTS: Using this approach, we have created the Neural-Immune Gene Ontology (NIGO) subset of GO directed for neurological and immunological systems. We tested the performance of NIGO in extracting knowledge from microarray experiments by conducting functional analysis and comparing the results to those obtained using the full GO and a generic GO slim. NIGO not only improved the statistical scores given to relevant terms, but was also able to retrieve functionally relevant terms that did not pass statistical cutoffs when using the full GO or the slim subset. CONCLUSIONS: Our results validate the pipeline used to generate NIGO, suggesting it is indeed enriched with terms that are specific to the neural/immune domains. The results suggest that NIGO can enhance the analysis of microarray experiments involving neural and/or immune related systems. They also directly demonstrate the potential such a domain-specific GO has in generating meaningful hypotheses.
Project description:BACKGROUND: Various measures of semantic similarity of terms in bio-ontologies such as the Gene Ontology (GO) have been used to compare gene products. Such measures of similarity have been used to annotate uncharacterized gene products and group gene products into functional groups. There are various ways to measure semantic similarity, either using the topological structure of the ontology, the instances (gene products) associated with terms or a mixture of both. We focus on an instance level definition of semantic similarity while using the information contained in the ontology, both in the graphical structure of the ontology and the semantics of relations between terms, to provide constraints on our instance level description.Semantic similarity of terms is extended to annotations by various approaches, either though aggregation operations such as min, max and average or through an extrapolative method. These approaches introduce assumptions about how semantic similarity of terms relates to the semantic similarity of annotations that do not necessarily reflect how terms relate to each other. RESULTS: We exploit the semantics of relations in the GO to construct an algorithm called SSA that provides the basis of a framework that naturally extends instance based methods of semantic similarity of terms, such as Resnik's measure, to describing annotations and not just terms. Our measure attempts to correctly interpret how terms combine via their relationships in the ontological hierarchy. SSA uses these relationships to identify the most specific common ancestors between terms. We outline the set of cases in which terms can combine and associate partial order constraints with each case that order the specificity of terms. These cases form the basis for the SSA algorithm. The set of associated constraints also provide a set of principles that any improvement on our method should seek to satisfy. CONCLUSION: We derive a measure of semantic similarity between annotations that exploits all available information without introducing assumptions about the nature of the ontology or data. We preserve the principles underlying instance based methods of semantic similarity of terms at the annotation level. As a result our measure better describes the information contained in annotations associated with gene products and as a result is better suited to characterizing and classifying gene products through their annotations.
Project description:Autophagy is a fundamental cellular process that is well conserved among eukaryotes. It is one of the strategies that cells use to catabolize substances in a controlled way. Autophagy is used for recycling cellular components, responding to cellular stresses and ridding cells of foreign material. Perturbations in autophagy have been implicated in a number of pathological conditions such as neurodegeneration, cardiac disease and cancer. The growing knowledge about autophagic mechanisms needs to be collected in a computable and shareable format to allow its use in data representation and interpretation. The Gene Ontology (GO) is a freely available resource that describes how and where gene products function in biological systems. It consists of 3 interrelated structured vocabularies that outline what gene products do at the biochemical level, where they act in a cell and the overall biological objectives to which their actions contribute. It also consists of 'annotations' that associate gene products with the terms. Here we describe how we represent autophagy in GO, how we create and define terms relevant to autophagy researchers and how we interrelate those terms to generate a coherent view of the process, therefore allowing an interoperable description of its biological aspects. We also describe how annotation of gene products with GO terms improves data analysis and interpretation, hence bringing a significant benefit to this field of study.