Joint Analysis of Dependent Features within Compound Spectra Can Improve Detection of Differential Features.
ABSTRACT: Mass spectrometry is an important analytical technology in metabolomics. After the initial feature detection and alignment steps, the raw data processing results in a high-dimensional data matrix of mass spectral features, which is then subjected to further statistical analysis. Univariate tests like Student's t-test and Analysis of Variances (ANOVA) are hypothesis tests, which aim to detect differences between two or more sample classes, e.g., wildtype-mutant or between different doses of treatments. In both cases, one of the underlying assumptions is the independence between metabolic features. However, in mass spectrometry, a single metabolite usually gives rise to several mass spectral features, which are observed together and show a common behavior. This paper suggests to group the related features of metabolites with CAMERA into compound spectra, and then to use a multivariate statistical method to test whether a compound spectrum (and thus the actual metabolite) is differential between two sample classes. The multivariate method is first demonstrated with an analysis between wild-type and an over-expression line of the model plant Arabidopsis thaliana. For a quantitative evaluation data sets with a simulated known effect between two sample classes were analyzed. The spectra-wise analysis showed better detection results for all simulated effects.
Project description:Small molecules are central to all biological processes and metabolomics becoming an increasingly important discovery tool. Robust, accurate and efficient experimental approaches are critical to supporting and validating predictions from post-genomic studies. To accurately predict metabolic changes and dynamics, experimental design requires multiple biological replicates and usually multiple treatments. Mass spectra from each run are processed and metabolite features are extracted. Because of machine resolution and variation in replicates, one metabolite may have different implementations (values) of retention time and mass in different spectra. A major impediment to effectively utilizing untargeted metabolomics data is ensuring accurate spectral alignment, enabling precise recognition of features (metabolites) across spectra. Existing alignment algorithms use either a global merge strategy or a local merge strategy. The former delivers an accurate alignment, but lacks efficiency. The latter is fast, but often inaccurate. Here we document a new algorithm employing a technique known as quicksort. The results on both simulated data and real data show that this algorithm provides a dramatic increase in alignment speed and also improves alignment accuracy.
Project description:Culturomics-based bacterial diversity studies benefit from the implementation of MALDI-TOF MS to remove genomically redundant isolates from isolate collections. We previously introduced SPeDE, a novel tool designed to dereplicate spectral datasets at an infraspecific level into operational isolation units (OIUs) based on unique spectral features. However, biological and technical variation may result in methodology-induced differences in MALDI-TOF mass spectra and hence provoke the detection of genomically redundant OIUs. In the present study, we used three datasets to analyze to which extent hierarchical clustering and network analysis allowed to eliminate redundant OIUs obtained through biological and technical sample variation and to describe the diversity within a set of spectra obtained from 134 unknown soil isolates. Overall, network analysis based on unique spectral features in MALDI-TOF mass spectra enabled a superior selection of genomically diverse OIUs compared to hierarchical clustering analysis and provided a better understanding of the inter-OIU relationships.
Project description:Despite the increasing importance of non-targeted metabolomics to answer various life science questions, extracting biochemically relevant information from metabolomics spectral data is still an incompletely solved problem. Most computational tools to identify tandem mass spectra focus on a limited set of molecules of interest. However, such tools are typically constrained by the availability of reference spectra or molecular databases, limiting their applicability of generating structural hypotheses for unknown metabolites. In contrast, recent advances in the field illustrate the possibility to expose the underlying biochemistry without relying on metabolite identification, in particular via substructure prediction. We describe an automated method for substructure recommendation motivated by association rule mining. Our framework captures potential relationships between spectral features and substructures learned from public spectral libraries. These associations are used to recommend substructures for any unknown mass spectrum. Our method does not require any predefined metabolite candidates, and therefore it can be used for the hypothesis generation or partial identification of unknown unknowns. The method is called MESSAR (MEtabolite SubStructure Auto-Recommender) and is implemented in a free online web service available at messar.biodatamining.be.
Project description:Supervised modeling of mass spectrometry imaging (MSI) data is a crucial component for the detection of the distinct molecular characteristics of cancerous tissues. Currently, two types of supervised analyses are mainly used on MSI data: pixel-wise segmentation of sample images and whole-sample-based classification. A large number of mass spectra associated with each MSI sample can represent a challenge for designing models that simultaneously preserve the overall molecular content while capturing valuable information contained in the MSI data. Furthermore, intensity-related batch effects can introduce biases in the statistical models. Here we introduce a method based on ion colocalization features that allows the classification of whole tissue specimens using MSI data, which naturally preserves the spatial information associated the with the mass spectra and is less sensitive to possible batch effects. Finally, we propose data visualization strategies for the inspection of the derived networks, which can be used to assess whether the correlation differences are related to coexpression/suppression or disjoint spatial localization patterns and can suggest hypotheses based on the underlying mechanisms associated with the different classes of analyzed samples.
Project description:The metabolite profiling of a NIST plasma Standard Reference Material (SRM 1950) on different liquid chromatography/mass spectrometry (LC/MS) platforms showed significant differences. Although these findings suggest caution when interpreting metabolomics results, the degree of overlap of both profiles allowed us to use tandem mass spectral libraries of recurrent spectra to evaluate to what extent these results are transferable across platforms and to develop cross-platform chemical signatures.Non-targeted global metabolite profiles of SRM 1950 were obtained on different LC/MS platforms using reversed-phase chromatography and different chromatographic scales (conventional HPLC, UHPLC and nanoLC). The data processing and the metabolite differential analysis were carried out using publically available (XCMS), proprietary (Mass Profiler Professional) and in-house software (NIST pipeline).Repeatability and intermediate precision showed that the non-targeted SRM 1950 profiling was highly reproducible when working on the same platform (relative standard deviation (RSD) <2%); however, substantial differences were found in the LC/MS patterns originating on different platforms or even using different chromatographic scales (conventional HPLC, UHPLC and nanoLC) on the same platform. A substantial degree of overlap (common molecular features) was also found. A procedure to generate consistent chemical signatures using tandem mass spectral libraries of recurrent spectra is proposed.Different platforms rendered significantly different metabolite profiles, but the results were highly reproducible when working within one platform. Tandem mass spectral libraries of recurrent spectra are proposed to evaluate the degree of transferability of chemical signatures generated on different platforms. Chemical signatures based on our procedure are most likely cross-platform transferable.
Project description:Quantifying the similarity of spectra is an important task in various areas of spectroscopy, for example, to identify a compound by comparing sample spectra to those of reference standards. In mass spectrometry based discovery proteomics, spectral comparisons are used to infer the amino acid sequence of peptides. In targeted proteomics by selected reaction monitoring (SRM) or SWATH MS, predetermined sets of fragment ion signals integrated over chromatographic time are used to identify target peptides in complex samples. In both cases, confidence in peptide identification is directly related to the quality of spectral matches. In this study, we used sets of simulated spectra of well-controlled dissimilarity to benchmark different spectral comparison measures and to develop a robust scoring scheme that quantifies the similarity of fragment ion spectra. We applied the normalized spectral contrast angle score to quantify the similarity of spectra to objectively assess fragment ion variability of tandem mass spectrometric datasets, to evaluate portability of peptide fragment ion spectra for targeted mass spectrometry across different types of mass spectrometers and to discriminate target assays from decoys in targeted proteomics. Altogether, this study validates the use of the normalized spectral contrast angle as a sensitive spectral similarity measure for targeted proteomics, and more generally provides a methodology to assess the performance of spectral comparisons and to support the rational selection of the most appropriate similarity measure. The algorithms used in this study are made publicly available as an open source toolset with a graphical user interface.
Project description:This study reports on the use of traveling wave ion mobility quadrupole time-of-flight (ToF) mass spectrometry for plasma metabolomics. Plasma metabolite profiles of obese Zucker fa/fa rats were obtained after the administration of different oral doses of Xanthohumol; a hop-derived dietary supplement. Liquid chromatography coupled data independent tandem mass spectrometry (LC-MSE) and LC-ion mobility spectrometry (IMS)-MSE acquisitions were conducted in both positive and negative modes using a Synapt G2 High Definition Mass Spectrometry (HDMS) instrument. This method provides identification of metabolite classes in rat plasma using parallel alternating low energy and high energy collision spectral acquisition modes. Data sets were analyzed using pattern recognition methods. Statistically significant (p < 0.05 and fold change (FC) threshold > 1.5) features were selected to identify the up-/down-regulated metabolite classes. Ion mobility data visualized using drift scope software provided a graphical read-out of differences in metabolite classes.
Project description:Metabolite annotation and identification are primary challenges in untargeted metabolomics experiments. Rigorous workflows for reliable annotation of mass features with chemical structures or compound classes are needed to enhance the power of untargeted mass spectrometry. High-resolution mass spectrometry considerably improves the confidence in assigning elemental formulas to mass features in comparison to nominal mass spectrometry, and embedding of fragmentation methods enables more reliable metabolite annotations and facilitates metabolite classification. However, the analysis of mass fragmentation spectra can be a time-consuming step and requires expert knowledge. This study demonstrates how characteristic fragmentations, specific to compound classes, can be used to systematically analyze their presence in complex biological extracts like urine that have undergone untargeted mass spectrometry combined with data dependent or targeted fragmentation. Human urine extracts were analyzed using normal phase liquid chromatography (hydrophilic interaction chromatography) coupled to an Ion Trap-Orbitrap hybrid instrument. Subsequently, mass chromatograms and collision-induced dissociation and higher-energy collisional dissociation (HCD) fragments were annotated using the freely available MAGMa software. Acylcarnitines play a central role in energy metabolism by transporting fatty acids into the mitochondrial matrix. By filtering on a combination of a mass fragment and neutral loss designed based on the MAGMa fragment annotations, we were able to classify and annotate 50 acylcarnitines in human urine extracts, based on high-resolution mass spectrometry HCD fragmentation spectra at different energies for all of them. Of these annotated acylcarnitines, 31 are not described in HMDB yet and for only 4 annotated acylcarnitines the fragmentation spectra could be matched to reference spectra. Therefore, we conclude that the use of mass fragmentation filters within the context of untargeted metabolomics experiments is a valuable tool to enhance the annotation of small metabolites.
Project description:Although physicochemical fractionation techniques play a crucial role in the analysis of complex mixtures, they are not necessarily the best solution to separate specific molecular classes, such as lipids and peptides. Any physical fractionation step such as, for example, those based on liquid chromatography, will introduce its own variation and noise. In this paper we investigate to what extent the high sensitivity and resolution of contemporary mass spectrometers offers viable opportunities for computational separation of signals in full scan spectra. We introduce an automatic method that can discriminate peptide from lipid peaks in full scan mass spectra, based on their isotopic properties. We systematically evaluate which features maximally contribute to a peptide versus lipid classification. The selected features are subsequently used to build a random forest classifier that enables almost perfect separation between lipid and peptide signals without requiring ion fragmentation and classical tandem MS-based identification approaches. The classifier is trained on in silico data, but is also capable of discriminating signals in real world experiments. We evaluate the influence of typical data inaccuracies of common classes of mass spectrometry instruments on the optimal set of discriminant features. Finally, the method is successfully extended towards the classification of individual lipid classes from full scan mass spectral features, based on input data defined by the Lipid Maps Consortium.
Project description:Nuclear magnetic resonance spectroscopy is one of the primary tools in metabolomics analyses, where it is used to track and quantify changes in metabolite concentrations or profiles in response to perturbation through disease, toxicants or drugs. The spectra generated through such analyses are typically confounded by noise of various types, obscuring the signals and hindering downstream statistical analysis. Such issues are becoming increasingly significant as greater numbers of large-scale systems or longitudinal studies are being performed, in which many spectra from different conditions need to be compared simultaneously.We describe a novel approach, termed Progressive Consensus Alignment of Nmr Spectra (PCANS), for the alignment of NMR spectra. Through the progressive integration of many pairwise comparisons, this approach generates a single consensus spectrum as an output that is then used to adjust the chemical shift positions of the peaks from the original input spectra to their final aligned positions. We characterize the performance of PCANS by aligning simulated NMR spectra, which have been provided with user-defined amounts of chemical shift variation as well as inter-group differences as would be observed in control-treatment applications. Moreover, we demonstrate how our method provides better performance than either template-based alignment or binning. Finally, we further evaluate this approach in the alignment of real mouse urine spectra and demonstrate its ability to improve downstream PCA and PLS analyses.By avoiding the use of a template or reference spectrum, PCANS allows for the creation of a consensus spectrum that enhances the signals within the spectra while maintaining sample-specific features. This approach is of greatest benefit when complex samples are being analyzed and where it is expected that there will be spectral features unique and/or strongly different between subgroups within the samples. Furthermore, this approach can be potentially applied to the alignment of any data having spectra-like properties.