IMatch: a retention index tool for analysis of gas chromatography-mass spectrometry data.
ABSTRACT: A method was developed to employ National Institute of Standards and Technology (NIST) 2008 retention index database information for molecular retention matching via constructing a set of empirical distribution functions (DFs) of the absolute retention index deviation to its mean value. The effects of different experimental parameters on the molecules' retention indices were first assessed. The column class, the column type, and the data type have significant effects on the retention index values acquired on capillary columns. However, the normal alkane retention index (I(norm)) with the ramp condition is similar to the linear retention index (I(T)), while the I(norm) with the isothermal condition is similar to the Kováts retention index (I). As for the I(norm) with the complex condition, these data should be treated as an additional group, because the mean I(norm) value of the polar column is significantly different from the I(T). Based on this analysis, nine DFs were generated from the grouped retention index data. The DF information was further implemented into a software program called iMatch. The performance of iMatch was evaluated using experimental data of a mixture of standards and metabolite extract of rat plasma with spiked-in standards. About 19% of the molecules identified by ChromaTOF were filtered out by iMatch from the identification list of electron ionization (EI) mass spectral matching, while all of the spiked-in standards were preserved. The analysis results demonstrate that using the retention index values, via constructing a set of DFs, can improve the spectral matching-based identifications by reducing a significant portion of false-positives.
Project description:We developed a method, iMatch2, for compound identification using retention indices (RI) in NIST11 library. Three-way ANOVA test and Kruskal-Wallis test respectively demonstrate that column class and temperature program type defined by the NIST library are the most dominant factors affecting the magnitude of retention index while the retention index data type does not cause significant difference. The developed linear regression transformation for merging retention indices with different data types, but the same column class and temperature program type, reduces the standard deviation of retention index up to 8%, compared to the simple union approach used in the original iMatch. As for outlier detection methods to remove retention indices having large difference with the remaining data of the same compound, Tietjen-Moore test and generalized extreme studentized deviate test are the strictest methods, while methods such as Dixon's test, Thompson tau approach, and Grubbs' test are more conservative. To improve the accuracy of retention index window, a concept of compound specific retention index window is introduced for compounds with a large number of retention indices in the NIST11 library, while the retention index window is calculated from empirical distributions for the compounds with a small number of retention indices. Analysis of the experimental data of a mixture of compound standards and the metabolite extract from mouse liver show significant improvement of retention index quality in the NIST11 library and the new data analysis methods.
Project description:Sugar and organic acid contents are major factors for tomato fruit flavour and are important breeding traits. Here we provide an improved protocol for accurate quantification of the main sugars, glucose and fructose, and the organic acids, citric acid and malic acid, present in tomato. The tomato extract is spiked with lactose and tricarballylic acid as internal standards and loaded onto a NH2 solid phase extraction (SPE) column. The sugars appear in the flow-through and are subsequently analysed by HPLC using a Nucleodur NH2 column and a refractive index detector. The organic acids bind to the SPE column and are eluted with 400?mM phosphoric acid. For analysis, the organic acids are separated by HPLC using a Nucleodur C18ec column and detected by UV absorption at 210?nm. The method shows excellent inter-day and intra-day reproducibility for glucose, fructose and citric acid with standard deviations of 1-5%. Quantification of citric acid by HPLC and GC-MS showed perfect agreement with a deviation of less than 3%. •Simple method for quantification of glucose, fructose, citric acid and malic acid in tomato.•Efficient removal of interfering compounds by solid phase extraction.•High intra and inter-day reproducibility.
Project description:Due to the high complexity of metabolome, the comprehensive 2D gas chromatography time-of-flight mass spectrometry (GC×GC-TOF MS) is considered as a powerful analytical platform for metabolomics study. However, the applications of GC×GC-TOF MS in metabolomics are not popular owing to the lack of bioinformatics system for data analysis.We developed a computational platform entitled metabolomics profiling pipeline (MetPP) for analysis of metabolomics data acquired on a GC×GC-TOF MS system. MetPP can process peak filtering and merging, retention index matching, peak list alignment, normalization, statistical significance tests and pattern recognition, using the peak lists deconvoluted from the instrument data as its input. The performance of MetPP software was tested with two sets of experimental data acquired in a spike-in experiment and a biomarker discovery experiment, respectively. MetPP not only correctly aligned the spiked-in metabolite standards from the experimental data, but also correctly recognized their concentration difference between sample groups. For analysis of the biomarker discovery data, 15 metabolites were recognized with significant concentration difference between the sample groups and these results agree with the literature results of histological analysis, demonstrating the effectiveness of applying MetPP software for disease biomarker discovery.The source code of MetPP is available at http://email@example.comSupplementary data are available at Bioinformatics online.
Project description:To circumvent the detrimental effects of large-volume injection with fixed-loop injector in modern supercritical fluid chromatography, the feasibility of performing multiple injection was investigated. By accumulating analytes from a certain number of continual small-volume injections, compounds can be concentrated on the column head, and this leads to signal enhancement compared with a single injection. The signal to noise enhancement of different compounds appeared to be associated with their retention on different stationary phases and with type of sample diluent. The diethylamine column gave the best signal to noise enhancement when acetonitrile was used as sample diluent and the 2-picolylamine column showed the best overall performance with water as the sample diluent. The advantage of multiple injection over one-time large-volume injection was proven with sulfanilamide, with both acetonitrile and water as sample diluents. The multiple injection approach exhibited comparable within- and between-day precision of retention time and peak area with those of single injections. The potential of the multiple injection approach was demonstrated in the analysis of sulfanilamide-spiked honey extract and diclofenac-spiked ground water sample. The limitations of this approach were also discussed.
Project description:The data investigated 198 volatile compounds of six currant cultivars grown in China which is analyzed by SPME-GC-MS. Volatile compounds in these currant samples were identified by two methods, comparing retention indices with reference standards and matching mass spectrum in the NST11 library. A synthetic currant matrix prepared according to the currant juice condition were extracted and analyzed using the same extraction procedure as the currant samples. The standard curve was generated for quantification of volatile compounds. For the volatiles without the available standard, the data provided consulting standards that had the same carbon atom or the similar functional structure for quantification. Further interpretation and discussion can be seen in article entitled "Characterization of Free and Bound Volatile Compounds in Six Ribes nigrum L. Blackcurrant Cultivars" (Liu et al., 2018) .
Project description:The utility of metabolomics is well documented; however, its full scientific promise has not yet been realized due to multiple technical challenges. These grand challenges include accurate chemical identification of all observable metabolites and the limiting depth-of-coverage of current metabolomics methods. Here, we report a combinatorial solution to aid in both grand challenges using UHPLC-trapped ion mobility spectrometry coupled to tandem mass spectrometry (UHPLC-TIMS-TOF-MS). TIMS offers additional depth-of-coverage through increased peak capacities realized with the multi-dimensional UHPLC-TIMS separations. Metabolite identification confidence is simultaneously enhanced by incorporating orthogonal collision cross section (CCS) data matching. To facilitate metabolite identifications, we created a CCS library of 146 plant natural products. This library was generated using TIMS with N2 drift gas to record the TIMSCCSN2 of plant natural products with a high degree of reproducibility; i.e., average RSD = 0.10%. The robustness of TIMSCCSN2 data matching was tested using authentic standards spiked into complex plant extracts, and the precision of CCS measurements were determined to be independent of matrix affects. The utility of the UHPLC-TIMS-TOF-MS/MS in metabolomics was then demonstrated using extracts from the model legume Medicago truncatula and metabolites were confidently identified based on retention time, accurate mass, molecular formula, and CCS.
Project description:Comprehensive two-dimensional gas chromatography mass spectrometry (GC × GC-MS) has been widely used for analysis of volatile compounds. However, the second dimension retention index (I) of each compound is not widely used to aid compound identification owing to the limited accuracy of I calculation. We report a surface fitting approach to the calculation of I using n-alkanes (C7-C30) as references, where the second dimension retention time (2tR) and the second dimension column temperature (2Te) formed the X-Y plane and the I was the Z-axis to form the I surface. Compared to the conventional approach for calculating I using isovolatility curves, the surface fitting approach eliminated the construction of isovolatility curves for the reference compounds and gives better reproducibility. The goodness of the proposed surface fitting achieved R2 = 0.9999 and RMSE = 6.1 retention index units (iu). Ten-fold cross validation demonstrated the surface fitting approach had a good predictability with average R2 = 0.9999 and RMSE = 6.6 iu. The developed method was also applied to calculate the second dimension retention indices of compound standards in two commercial mixtures MegaMix A and MegaMix B. The mean standard deviation of the calculated I was only 1.6 iu for compounds in MegaMix A and 3.4 iu for compounds in MegaMix B. Compared with the literature results, the small value of standard deviation in the calculated retention index using surface fitting method shows that the surface fitting method has less measurement variability than the conventional isovolatility curve approach.
Project description:We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation. In an asymptotic framework based on the Spiked Covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker ? that acts elementwise on the sample eigenvalues. Indeed, to each loss function there corresponds a unique admissible eigenvalue shrinker ?* dominating all other shrinkers. The shape of the optimal shrinker is determined by the choice of loss function and, crucially, by inconsistency of both eigenvalues and eigenvectors of the sample covariance matrix. Details of these phenomena and closed form formulas for the optimal eigenvalue shrinkers are worked out for a menagerie of 26 loss functions for covariance estimation found in the literature, including the Stein, Entropy, Divergence, Fréchet, Bhattacharya/Matusita, Frobenius Norm, Operator Norm, Nuclear Norm and Condition Number losses.
Project description:The gradient produced by an HPLC is never the same as the one it is programmed to produce, but non-idealities in the gradient can be taken into account if they are measured. Such measurements are routine, yet only one general approach has been described to make them: both HPLC solvents are replaced with water, solvent B is spiked with 0.1% acetone, and the gradient is measured by UV absorbance. Despite the widespread use of this procedure, we found a number of problems and complications with it, mostly stemming from the fact that it measures the gradient under abnormal conditions (e.g. both solvents are water). It is also generally not amenable to MS detection, leaving those with only an MS detector no way to accurately measure their gradients. We describe a new approach called "Measure Your Gradient" that potentially solves these problems. One runs a test mixture containing 20 standards on a standard stationary phase and enters their gradient retention times into open-source software available at www.measureyourgradient.org. The software uses the retention times to back-calculate the gradient that was truly produced by the HPLC. Here we present a preliminary investigation of the new approach. We found that gradients measured this way are comparable to those measured by a more accurate, albeit impractical, version of the conventional approach. The new procedure worked with different gradients, flow rates, column lengths, inner diameters, on two different HPLCs, and with six different batches of the standard stationary phase.
Project description:Liquid Chromatography Time-of-Flight Mass Spectrometry (LC-TOF-MS) is widely used for profiling metabolite compounds. LC-TOF-MS is a chemical analysis technique that combines the physical separation capabilities of high-pressure liquid chromatography (HPLC) with the mass analysis capabilities of Time-of-Flight Mass Spectrometry (TOF-MS) which utilizes the difference in the flight time of ions due to difference in the mass-to-charge ratio. Since metabolite compounds have various chemical characteristics, their precise identification is a crucial problem of metabolomics research. Contemporaneously analyzed reference standards are commonly required for mass spectral matching and retention time matching, but there are far fewer reference standards than there are compounds in the organism. We therefore developed a retention time prediction method for HPLC to improve the accuracy of identification of metabolite compounds. This method uses a combination of Support Vector Regression and Multiple Linear Regression adaptively to the measured retention time. We achieved a strong correlation (correlation coefficient = 0.974) between measured and predicted retention times for our experimental data. We also demonstrated a successful identification of an E. coli metabolite compound that cannot be identified by precise mass alone.