Project description:Mass spectrometry (MS) coupled to liquid chromatography (LC) is a commonly used technique in metabolomic and proteomic research. As the size and complexity of LC-MS-based experiments grow, it becomes increasingly more difficult to perform quality control of both raw data and processing results. In a practical setting, quality control steps for raw LC-MS data are often overlooked, and assessment of an experiment's success is based on some derived metrics such as "the number of identified compounds". The human brain interprets visual data much better than plain text, hence the saying "a picture is worth a thousand words". Here, we present the BatMass software package, which allows for performing quick quality control of raw LC-MS data through its fast visualization capabilities. It also serves as a testbed for developers of LC-MS data processing algorithms by providing a data access library for open mass spectrometry file formats and a means of visually mapping processing results back to the original data. We illustrate the utility of BatMass with several use cases of quality control and data exploration.
Project description:Hydrophilic interaction chromatography (HILIC) liquid chromatography/mass spectrometry (LC/MS) is appropriate for all native and reductively aminated glycan classes. HILIC carries the advantage that retention times vary predictably according to oligosaccharide composition. Chromatographic conditions are compatible with sensitive and reproducible glycomics analysis of large numbers of samples. The data are extremely useful for quantitative profiling of glycans expressed in biological tissues. With these analytical developments, the rate-limiting factor for widespread use of HILIC LC/MS in glycomics is the analysis of the data. In order to eliminate this problem, a Java-based open source software tool, Manatee, was developed for targeted analysis of HILIC LC/MS glycan datasets. This tool uses user-defined lists of compositions that specify the glycan chemical space in a given biological context. The program accepts high-resolution LC/MS data using the public mzXML format and is capable of processing a large data file in a few minutes on a standard desktop computer. The program allows mining of HILIC LC/MS data with an output compatible with multivariate statistical analysis. It is envisaged that the Manatee tool will complement more computationally intensive LC/MS processing tools based on deconvolution and deisotoping of LC/MS data. The capabilities of the tool were demonstrated using a set of HILIC LC/MS data on organ-specific heparan sulfates.
Project description:<b>Background:</b> An untargeted chemical analysis of bio-fluids provides semi-quantitative data for thousands of chemicals for expanding our understanding about relationships among metabolic pathways, diseases, phenotypes and exposures. During the processing of mass spectral and chromatography data, various signal thresholds are used to control the number of peaks in the final data matrix that is used for statistical analyses. However, commonly used stringent thresholds generate constrained data matrices which may under-represent the detected chemical space, leading to missed biological insights in the exposome research. <b>Methods:</b> We have re-analyzed a liquid chromatography high resolution mass spectrometry data set for a publicly available epidemiology study (<i>n</i> = 499) of human cord blood samples using the MS-DIAL software with minimally possible thresholds during the data processing steps. Peak list for individual files and the data matrix after alignment and gap-filling steps were summarized for different peak height and detection frequency thresholds. Correlations between birth weight and LC/MS peaks in the newly generated data matrix were computed using the spearman correlation coefficient. <b>Results:</b> MS-DIAL software detected on average 23,156 peaks for individual LC/MS file and 63,393 peaks in the aligned peak table. A combination of peak height and detection frequency thresholds that was used in the original publication at the individual file and the peak alignment levels can reject 90% peaks from the untargeted chemical analysis dataset that was generated by MS-DIAL. Correlation analysis for birth weight data suggested that up to 80% of the significantly associated peaks were rejected by the data processing thresholds that were used in the original publication. The re-analysis with minimum possible thresholds recovered metabolic insights about C19 steroids and hydroxy-acyl-carnitines and their relationships with birth weight. <b>Conclusions:</b> Data processing thresholds for peak height and detection frequencies at individual data file and at the alignment level should be used at minimal possible level or completely avoided for mining untargeted chemical analysis data in the exposome research for discovering new biomarkers and mechanisms.
Project description:BACKGROUND: Mass Spectrometry coupled to Liquid Chromatography (LC-MS) is commonly used to analyze the protein content of biological samples in large scale studies. The data resulting from an LC-MS experiment is huge, highly complex and noisy. Accordingly, it has sparked new developments in Bioinformatics, especially in the fields of algorithm development, statistics and software engineering. In a quantitative label-free mass spectrometry experiment, crucial steps are the detection of peptide features in the mass spectra and the alignment of samples by correcting for shifts in retention time. At the moment, it is difficult to compare the plethora of algorithms for these tasks. So far, curated benchmark data exists only for peptide identification algorithms but no data that represents a ground truth for the evaluation of feature detection, alignment and filtering algorithms. RESULTS: We present LC-MSsim, a simulation software for LC-ESI-MS experiments. It simulates ESI spectra on the MS level. It reads a list of proteins from a FASTA file and digests the protein mixture using a user-defined enzyme. The software creates an LC-MS data set using a predictor for the retention time of the peptides and a model for peak shapes and elution profiles of the mass spectral peaks. Our software also offers the possibility to add contaminants, to change the background noise level and includes a model for the detectability of peptides in mass spectra. After the simulation, LC-MSsim writes the simulated data to mzData, a public XML format. The software also stores the positions (monoisotopic m/z and retention time) and ion counts of the simulated ions in separate files. CONCLUSION: LC-MSsim generates simulated LC-MS data sets and incorporates models for peak shapes and contaminations. Algorithm developers can match the results of feature detection and alignment algorithms against the simulated ion lists and meaningful error rates can be computed. We anticipate that LC-MSsim will be useful to the wider community to perform benchmark studies and comparisons between computational tools.
Project description:The analysis and management of MS data, especially those generated by data independent MS acquisition, exemplified by SWATH-MS, pose significant challenges for proteomics bioinformatics. The large size and vast amount of information inherent to these data sets need to be properly structured to enable an efficient and straightforward extraction of the signals used to identify specific target peptides. Standard XML based formats are not well suited to large MS data files, for example, those generated by SWATH-MS, and compromise high-throughput data processing and storing. We developed mzDB, an efficient file format for large MS data sets. It relies on the SQLite software library and consists of a standardized and portable server-less single-file database. An optimized 3D indexing approach is adopted, where the LC-MS coordinates (retention time and m/z), along with the precursor m/z for SWATH-MS data, are used to query the database for data extraction. In comparison with XML formats, mzDB saves ∼25% of storage space and improves access times by a factor of twofold up to even 2000-fold, depending on the particular data access. Similarly, mzDB shows also slightly to significantly lower access times in comparison with other formats like mz5. Both C++ and Java implementations, converting raw or XML formats to mzDB and providing access methods, will be released under permissive license. mzDB can be easily accessed by the SQLite C library and its drivers for all major languages, and browsed with existing dedicated GUIs. The mzDB described here can boost existing mass spectrometry data analysis pipelines, offering unprecedented performance in terms of efficiency, portability, compactness, and flexibility.
Project description:Certain estrogen metabolites have been implicated in the pathophysiology of breast cancer. Moreover, the estrogen metabolite profiles of healthy women and those with (a high risk of) breast cancer differ significantly. The development of an analytical method to determine the relative levels of all the estrogen biotransformation products has been described in van der Berg et al. . An improvement on previously developed methods was the ability to also detect molecules such as sulphate and glucuronide conjugates as well as progesterone, estradiol precursors, and metabolites from the 16-hydroxylation metabolic pathway of estrogens simultaneously with all other estrogen metabolites. The data presented here describe the optimisation of a solid phase extraction method with different fractionation steps for LC-MS/MS analysis of 27 estrogen-related metabolites from small urine volumes. Conditions that were optimised include the elution and washing solvent concentration, the urine, loading, washing, and elution volumes, as well as pH. All raw data used to construct the bar graphs presented in this article are included in the supplementary data file. The data indicated that fractionation was necessary in order to elute estrogen metabolites with different chemical properties at different eluate compositions. Only one of the fractions (containing the less water-soluble metabolites) underwent derivatisation before LC-MS/MS analysis.
Project description:Although liquid chromatography/mass spectrometry using selected reaction monitoring (LC/SRM-MS) holds great promise for targeted protein analysis, quantification of therapeutic monoclonal antibody (mAb) in tissues represents a daunting challenge due to the extremely low tissue levels, complexity of tissue matrixes, and the absence of an efficient strategy to develop an optimal LC/SRM-MS method. Here we describe a high-throughput, streamlined strategy for the development of sensitive, selective, and reliable quantitative methods of mAb in tissue matrixes. A sensitive nano-LC/nanospray-MS method was employed to achieve a low lower limit of quantification (LOQ). For selection of signature peptides (SP), the SP candidates were identified by a high-resolution Orbitrap and then optimal SRM conditions for each candidate were obtained using a high-throughput, on-the-fly orthogonal array optimization (OAO) strategy, which is capable of optimizing a large set of SP candidates within a single nano-LC/SRM-MS run. Using the optimized conditions, the candidates were experimentally evaluated for both sensitivity and stability in the target matrixes, and SP selection was based on the results of the evaluation. Two unique SP, respectively from the light and heavy chain, were chosen for quantification of each mAb. The use of two SP improves the quantitative reliability by gauging possible degradation/modification of the mAb. Standard mAb proteins with verified purities were utilized for calibration curves, to prevent the quantitative biases that may otherwise occur when synthesized peptides were used as calibrators. We showed a proof of concept by rapidly developing sensitive nano-LC/SRM-MS methods for quantifying two mAb (8c2 and cT84.66) in multiple preclinical tissues. High sensitivity was achieved for both mAb with LOQ ranged from 0.156 to 0.312 ?g/g across different tissues, and the overall procedure showed a wide dynamic range (?500-fold) and good accuracy [relative error (RE) < 18.8%] and precision [interbatch relative standard deviation (RSD) < 18.1%, intrabatch RSD < 17.2%]. The quantitative method was applied to a comprehensive investigation of the steady-state tissue distribution of 8c2 in wild-type mice versus those deficient in FcRn ?-chain, Fc?IIb, and Fc?RI/Fc?RIII, following a chronic dosing regimen. This work represents the first extensive quantification of mAb in tissues by an LC/MS-based method.
Project description:The marine green macroalga Ulva (Chlorophyta) lives in a mutualistic symbiosis with bacteria that influence growth, development, and morphogenesis. We surveyed changes in Ulva's chemosphere, which was defined as a space where organisms interact with each other via compounds, such as infochemicals, nutrients, morphogens, and defense compounds. Thereby, Ulva mutabilis cooperates with bacteria, in particular, Roseovarius sp. strain MS2 and Maribacter sp. strain MS6 (formerly identified as Roseobacter sp. strain MS2 and Cytophaga sp. strain MS6). Without this accompanying microbial flora, U. mutabilis forms only callus-like colonies. However, upon addition of the two bacteria species, in effect forming a tripartite community, morphogenesis can be completely restored. Under this strictly standardized condition, bioactive and eco-physiologically-relevant marine natural products can be discovered. Solid phase extracted waterborne metabolites were analyzed using a metabolomics platform, facilitating gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS) analysis, combined with the necessary acquisition of biological metadata. Multivariate statistics of the GC-MS and LC-MS data revealed strong differences between Ulva's growth phases, as well as between the axenic Ulva cultures and the tripartite community. Waterborne biomarkers, including glycerol, were identified as potential indicators for algal carbon source and bacterial-algal interactions. Furthermore, it was demonstrated that U. mutabilis releases glycerol that can be utilized for growth by Roseovarius sp. MS2.