Project description:LC-MS-based untargeted metabolomics is heavily dependent on algorithms for automated peak detection and data preprocessing due to the complexity and size of the raw data generated. These algorithms are generally designed to be as inclusive as possible in order to minimize the number of missed peaks. This is known to result in an abundance of false positive peaks that further complicate downstream data processing and analysis. As a consequence, considerable effort is spent identifying features of interest that might represent peak detection artifacts. Here, we present the CPC algorithm, which allows automated characterization of detected peaks with subsequent filtering of low quality peaks using quality criteria familiar to analytical chemists. We provide a thorough description of the methods in addition to applying the algorithms to authentic metabolomics data. In the example presented, the algorithm removed about 35% of the peaks detected by XCMS, a majority of which exhibited a low signal-to-noise ratio. The algorithm is made available as an R-package and can be fully integrated into a standard XCMS workflow.
Project description:We present a method for the systematic identification of picogram quantities of new lipids in total extracts of tissues and fluids. It relies on the modularity of lipid structures and applies all-ions fragmentation LC-MS/MS and Arcadiate software to recognize individual modules originating from the same lipid precursor of known or assumed structure. In this way it alleviates the need to recognize and fragment very low abundant precursors of novel molecules in complex lipid extracts. In a single analysis of rat kidney extract the method identified 58 known and discovered 74 novel endogenous endocannabinoids and endocannabinoid-related molecules, including a novel class of N-acylaspartates that inhibit Hedgehog signaling while having no impact on endocannabinoid receptors.
Project description:BackgroundCancer stem cell (CSC) is identified in osteosarcoma (OS) and considered resistant to chemotherapeutic agents. However, the mechanism of osteosarcoma stem cell (OSC) resistant to chemotherapy remains debatable and vague, and the metabolomics feature of OSC is not clarified.Materials and methodsOSC was isolated by using sphere forming assay and identified. Untargeted LC-MS/MS analysis was performed to reveal the metabolomics feature of OSC and underlying mechanisms of OSC resistant to methotrexate (MTX).ResultsOSC was efficiently isolated and identified from human OS 143B and MG63 cell lines with enhanced chemo-resistance to MTX. The untargeted LC-MS analysis revealed that OSC showed differential metabolites and perturbed signaling pathways, mainly involved in metabolisms of fatty acid, amino acid, carbohydrate metabolism and nucleic acid. After treated with MTX, metabolomics feature of OSC was mainly involved metabolisms of amino acid, fatty acid, energy and nucleic acid. Moreover, compared with their parental OS cells response to MTX, the differential metabolites and perturbed signaling pathways were mainly involved in metabolism of amino acid, fatty acid and nucleic acid. What's more, Rap1 signaling pathway and Ras signaling pathway were involved in OS cells and their SCs response to MTX.ConclusionSphere-forming assay was able to efficiently isolate OSC from human OS cell lines and the untargeted LC-MS/MS analysis was suggested a sufficient methodology to investigate metabolomics features of OS cells and OSCs. Moreover, the metabolomics features of OSCs response to MTX might reveal a further understanding of chemotherapeutic resistance in OS.
Project description:BackgroundUntargeted metabolomics datasets contain large proportions of uninformative features that can impede subsequent statistical analysis such as biomarker discovery and metabolic pathway analysis. Thus, there is a need for versatile and data-adaptive methods for filtering data prior to investigating the underlying biological phenomena. Here, we propose a data-adaptive pipeline for filtering metabolomics data that are generated by liquid chromatography-mass spectrometry (LC-MS) platforms. Our data-adaptive pipeline includes novel methods for filtering features based on blank samples, proportions of missing values, and estimated intra-class correlation coefficients.ResultsUsing metabolomics datasets that were generated in our laboratory from samples of human blood, as well as two public LC-MS datasets, we compared our data-adaptive filtering method with traditional methods that rely on non-method specific thresholds. The data-adaptive approach outperformed traditional approaches in terms of removing noisy features and retaining high quality, biologically informative ones. The R code for running the data-adaptive filtering method is provided at https://github.com/courtneyschiffman/Metabolomics-Filtering .ConclusionsOur proposed data-adaptive filtering pipeline is intuitive and effectively removes uninformative features from untargeted metabolomics datasets. It is particularly relevant for interrogation of biological phenomena in data derived from complex matrices associated with biospecimens.
Project description:Pooled quality controls (QCs) are usually implemented within untargeted methods to improve the quality of datasets by removing features either not detected or not reproducible. However, this approach can be limiting in exposomics studies conducted on groups of exposed and nonexposed subjects, as compounds present at low levels only in exposed subjects can be diluted and thus not detected in the pooled QC. The aim of this work is to develop and apply an untargeted workflow for human biomonitoring in urine samples, implementing a novel separated approach for preparing pooled quality controls. An LC-MS/MS workflow was developed and applied to a case study of smoking and non-smoking subjects. Three different pooled quality controls were prepared: mixing an aliquot from every sample (QC-T), only from non-smokers (QC-NS), and only from smokers (QC-S). The feature tables were filtered using QC-T (T-feature list), QC-S, and QC-NS, separately. The last two feature lists were merged (SNS-feature list). A higher number of features was obtained with the SNS-feature list than the T-feature list, resulting in identification of a higher number of biologically significant compounds. The separated pooled QC strategy implemented can improve the nontargeted human biomonitoring for groups of exposed and nonexposed subjects.
Project description:Untargeted metabolomics using high-resolution liquid chromatography-mass spectrometry (LC-MS) is becoming one of the major areas of high-throughput biology. Functional analysis, that is, analyzing the data based on metabolic pathways or the genome-scale metabolic network, is critical in feature selection and interpretation of metabolomics data. One of the main challenges in the functional analyses is the lack of the feature identity in the LC-MS data itself. By matching mass-to-charge ratio (m/z) values of the features to theoretical values derived from known metabolites, some features can be matched to one or more known metabolites. When multiple matchings occur, in most cases only one of the matchings can be true. At the same time, some known metabolites are missing in the measurements. Current network/pathway analysis methods ignore the uncertainty in metabolite identification and the missing observations, which could lead to errors in the selection of significant subnetworks/pathways. In this paper, we propose a flexible network feature selection framework that combines metabolomics data with the genome-scale metabolic network. The method adopts a sequential feature screening procedure and machine learning-based criteria to select important subnetworks and identify the optimal feature matching simultaneously. Simulation studies show that the proposed method has a much higher sensitivity than the commonly used maximal matching approach. For demonstration, we apply the method on a cohort of healthy subjects to detect subnetworks associated with the body mass index (BMI). The method identifies several subnetworks that are supported by the current literature, as well as detects some subnetworks with plausible new functional implications. The R code is available at http://web1.sph.emory.edu/users/tyu8/MSS.
Project description:IntroductionThe metabolomics quality assurance and quality control consortium (mQACC) evolved from the recognized need for a community-wide consensus on improving and systematizing quality assurance (QA) and quality control (QC) practices for untargeted metabolomics.ObjectivesIn this work, we sought to identify and share the common and divergent QA and QC practices amongst mQACC members and collaborators who use liquid chromatography-mass spectrometry (LC-MS) in untargeted metabolomics.MethodsAll authors voluntarily participated in this collaborative research project by providing the details of and insights into the QA and QC practices used in their laboratories. This sharing was enabled via a six-page questionnaire composed of over 120 questions and comment fields which was developed as part of this work and has proved the basis for ongoing mQACC outreach.ResultsFor QA, many laboratories reported documenting maintenance, calibration and tuning (82%); having established data storage and archival processes (71%); depositing data in public repositories (55%); having standard operating procedures (SOPs) in place for all laboratory processes (68%) and training staff on laboratory processes (55%). For QC, universal practices included using system suitability procedures (100%) and using a robust system of identification (Metabolomics Standards Initiative level 1 identification standards) for at least some of the detected compounds. Most laboratories used QC samples (>86%); used internal standards (91%); used a designated analytical acquisition template with randomized experimental samples (91%); and manually reviewed peak integration following data acquisition (86%). A minority of laboratories included technical replicates of experimental samples in their workflows (36%).ConclusionsAlthough the 23 contributors were researchers with diverse and international backgrounds from academia, industry and government, they are not necessarily representative of the worldwide pool of practitioners due to the recruitment method for participants and its voluntary nature. However, both questionnaire and the findings presented here have already informed and led other data gathering efforts by mQACC at conferences and other outreach activities and will continue to evolve in order to guide discussions for recommendations of best practices within the community and to establish internationally agreed upon reporting standards. We very much welcome further feedback from readers of this article.
Project description:MotivationWhen metabolites are analyzed by electrospray ionization (ESI)-mass spectrometry, they are usually detected as multiple ion species due to the presence of isotopes, adducts and in-source fragments. The signals generated by these degenerate features (along with contaminants and other chemical noise) obscure meaningful patterns in MS data, complicating both compound identification and downstream statistical analysis. To address this problem, we developed Binner, a new tool for the discovery and elimination of many degenerate feature signals typically present in untargeted ESI-LC-MS metabolomics data.ResultsBinner generates feature annotations and provides tools to help users visualize informative feature relationships that can further elucidate the underlying structure of the data. To demonstrate the utility of Binner and to evaluate its performance, we analyzed data from reversed phase LC-MS and hydrophilic interaction chromatography (HILIC) platforms and demonstrated the accuracy of selected annotations using MS/MS. When we compared Binner annotations of 75 compounds previously identified in human plasma samples with annotations generated by three similar tools, we found that Binner achieves superior performance in the number and accuracy of annotations while simultaneously minimizing the number of incorrectly annotated principal ions. Data reduction and pattern exploration with Binner have allowed us to catalog a number of previously unrecognized complex adducts and neutral losses generated during the ionization of molecules in LC-MS. In summary, Binner allows users to explore patterns in their data and to efficiently and accurately eliminate a significant number of the degenerate features typically found in various LC-MS modalities.Availability and implementationBinner is written in Java and is freely available from http://binner.med.umich.edu.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Untargeted metabolomics can detect more than 10 000 peaks in a single LC-MS run. The correspondence between these peaks and metabolites, however, remains unclear. Here, we introduce a Peak Annotation and Verification Engine (PAVE) for annotating untargeted microbial metabolomics data. The workflow involves growing cells in 13C and 15N isotope-labeled media to identify peaks from biological compounds and their carbon and nitrogen atom counts. Improved deisotoping and deadducting are enabled by algorithms that integrate positive mode, negative mode, and labeling data. To distinguish metabolites and their fragments, PAVE experimentally measures the response of each peak to weak in-source collision induced dissociation, which increases the peak intensity for fragments while decreasing it for their parent ions. The molecular formulas of the putative metabolites are then assigned based on database searching using both m/ z and C/N atom counts. Application of this procedure to Saccharomyces cerevisiae and Escherichia coli revealed that more than 80% of peaks do not label, i.e., are environmental contaminants. More than 70% of the biological peaks are isotopic variants, adducts, fragments, or mass spectrometry artifacts yielding ∼2000 apparent metabolites across the two organisms. About 650 match to a known metabolite formula based on m/ z and C/N atom counts, with 220 assigned structures based on MS/MS and/or retention time to match to authenticated standards. Thus, PAVE enables systematic annotation of LC-MS metabolomics data with only ∼4% of peaks annotated as apparent metabolites.