Project description:Advances in genomics have revealed many of the genetic underpinnings of human disease, but exposomics methods are currently inadequate to obtain a similar level of understanding of environmental contributions to human disease. Exposomics methods are limited by low abundance of xenobiotic metabolites and lack of authentic standards, which precludes identification using solely mass spectrometry-based criteria. Here, we develop and validate a method for enzymatic generation of xenobiotic metabolites for use with high-resolution mass spectrometry (HRMS) for chemical identification. Generated xenobiotic metabolites were used to confirm identities of respective metabolites in mice and human samples based upon accurate mass, retention time and co-occurrence with related xenobiotic metabolites. The results establish a generally applicable enzyme-based identification (EBI) for mass spectrometry identification of xenobiotic metabolites and could complement existing criteria for chemical identification.
Project description:We present the adaptability of Mascot search engine for automated identification of intact glycopeptide mass spectra. The steps involved in adopting Mascot for intact glycopeptide analysis include: i) assigning unique one letter codes for monosaccharides, ii) linearizing glycan sequences and iii) preparing custom glycoprotein databases. Stepped normalized collision energy (NCE) for HCD mostly provided both the peptide and glycan information in a single MS2 spectrum. Using standard glycoproteins, we showed that Mascot can be adopted for automated annotation of both N- and O-linked glycopeptides. In a large scale validation study, a total of 257 glycoproteins containing 970 unique glycosylation sites and 3447 non-redundant N-linked glycopeptide variants were identified in serum samples. This represent a single tool that collectively allows the i) elucidation of N- and O-linked glycopeptide spectra, ii) matching glycopeptides to known protein sequences, and iii) high-throughput, batch wise analysis of large scale glycoproteomics data sets.
Project description:The complexity of the brain and the links entailed to its functional diversity remain a major challenge of biology to understand. Distinct anatomical areas regulate a vast array of processes including organismal homeostasis, cognitive functions and susceptibility to neurological pathologies, many of which define our species. Distal enhancers have emerged as key regulatory elements that acquire epigenetic modifications in a cell-type specific manner, thus enforcing cell- and species-specific gene expression programs. Here, we survey the epigenetic landscape of promoters and cis-regulatory elements in 87 anatomically distinct regions of the human brain, spanning over a hundred different anatomical structures. ChIP-Seq of various regions of the human brain. Also includes mouse and rat samples. Contributor: The Netherlands Brain Bank
Project description:Polymerase chain reaction and restriction endonuclease digest are important techniques that should be included in all Biochemistry and Molecular Biology laboratory curriculums. These techniques are frequently taught at an advanced level, requiring many hours of student and faculty time. Here we present two inquiry-based experiments that are designed for introductory laboratory courses and combine both techniques. In both approaches, students must determine the identity of an unknown DNA sequence, either a gene sequence or a primer sequence, based on a combination of PCR product size and restriction digest pattern. The experimental design is flexible, and can be adapted based on available instructor preparation time and resources, and both approaches can accommodate large numbers of students. We implemented these experiments in our courses with a combined total of 584 students and have an 85% success rate. Overall, students demonstrated an increase in their understanding of the experimental topics, ability to interpret the resulting data, and proficiency in general laboratory skills.
Project description:Develop a novel de-glyco-assisted methylation site identification (DOMAIN) strategy which enables straightforward, fast, and reproducible analysis of protein methylation in a proteome-wide manner. Combining multidimensional fractionation and multiprotease digestion, our method enabled the identification of 573 methylated forms in 270 proteins, including 311 new methylation forms, in A549 cells. Combining this technique with stable isotope labeling quantitative proteomics and RNA interference, we determined the differential regulation of several putative methylated sites that are related to the protein arginine N-methyltransferase 3 (PRMT3). Collectively, our integrated proteomics workflow for comprehensive mapping of methylation sites enables a better understanding of protein methylation, while providing a rapid and effective approach for global protein methylation analysis in biomedical research.
Project description:Compound (or chemical) databases are an invaluable resource for many scientific disciplines. Exposomics researchers need to find and identify relevant chemicals that cover the entirety of potential (chemical and other) exposures over entire lifetimes. This daunting task, with over 100 million chemicals in the largest chemical databases, coupled with broadly acknowledged knowledge gaps in these resources, leaves researchers faced with too much-yet not enough-information at the same time to perform comprehensive exposomics research. Furthermore, the improvements in analytical technologies and computational mass spectrometry workflows coupled with the rapid growth in databases and increasing demand for high throughput "big data" services from the research community present significant challenges for both data hosts and workflow developers. This article explores how to reduce candidate search spaces in non-target small molecule identification workflows, while increasing content usability in the context of environmental and exposomics analyses, so as to profit from the increasing size and information content of large compound databases, while increasing efficiency at the same time. In this article, these methods are explored using PubChem, the NORMAN Network Suspect List Exchange and the in silico fragmentation approach MetFrag. A subset of the PubChem database relevant for exposomics, PubChemLite, is presented as a database resource that can be (and has been) integrated into current workflows for high resolution mass spectrometry. Benchmarking datasets from earlier publications are used to show how experimental knowledge and existing datasets can be used to detect and fill gaps in compound databases to progressively improve large resources such as PubChem, and topic-specific subsets such as PubChemLite. PubChemLite is a living collection, updating as annotation content in PubChem is updated, and exported to allow direct integration into existing workflows such as MetFrag. The source code and files necessary to recreate or adjust this are jointly hosted between the research parties (see data availability statement). This effort shows that enhancing the FAIRness (Findability, Accessibility, Interoperability and Reusability) of open resources can mutually enhance several resources for whole community benefit. The authors explicitly welcome additional community input on ideas for future developments.
Project description:Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Project description:The incidence of Alzheimer's disease (AD) is constantly increasing as the older population grows, and no effective treatment is currently available. In this study, we focused on the identification of AD molecular subtypes to facilitate the development of effective drugs. AD sequencing data collected from the Gene Expression Omnibus (GEO) database were subjected to cluster sample analysis. Each sample module was then identified as a specific AD molecular subtype, and the biological processes and pathways were verified. The main long non-coding RNAs and transcription factors regulating each "typing pathway" and their potential mechanisms were determined using the RNAInter and TRRUST databases. Based on the marker genes of each "typing module," a classifier was developed for molecular typing of AD. According to the pathways involved, five sample clustering modules were identified (mitogen-activated protein kinase, synaptic, autophagy, forkhead box class O, and cell senescence), which may be regulated through multiple pathways. The classifier showed good classification performance, which may be useful for developing novel AD drugs and predicting their indications.
Project description:Metabolite identification is a crucial step in mass spectrometry (MS)-based metabolomics. However, it is still challenging to assess the confidence of assigned metabolites. We report a novel method for estimating the false discovery rate (FDR) of metabolite assignment with a target-decoy strategy, in which the decoys are generated through violating the octet rule of chemistry by adding small odd numbers of hydrogen atoms. The target-decoy strategy was integrated into JUMPm, an automated metabolite identification pipeline for large-scale MS analysis and was also evaluated with two other metabolomics tools, mzMatch and MZmine 2. The reliability of FDR calculation was examined by false data sets, which were simulated by altering MS1 or MS2 spectra. Finally, we used the JUMPm pipeline coupled to the target-decoy strategy to process unlabeled and stable-isotope-labeled metabolomic data sets. The results demonstrate that the target-decoy strategy is a simple and effective method for evaluating the confidence of high-throughput metabolite identification.
Project description:Proteolysis is a major form of post translational modification which occurs when a protease cleaves peptide bonds in a target protein to modify its activity. Tracking protease substrates is indispensable for understanding its cellular functions. However, it is difficult to directly identify protease substrates because the end products of proteolysis, the cleaved protein fragments, must be identified among the pool of cellular proteins. Here we present a bead-based cleavage approach using immobilized proteome as the screening library to identify protease substrates. This method enables efficient separation of proteolyzed proteins from background protein mixture. Using caspase-3 as the model protease, we have identified 1159 high confident substrates, among which, strikingly, 43.9% of substrates undergo degradation during apoptosis. The huge number of substrates and positive support of in vivo evidence indicate that the BBC method is a powerful tool for protease substrates identification.