Predicting in silico electron ionization mass spectra using quantum chemistry.
ABSTRACT: Compound identification by mass spectrometry needs reference mass spectra. While there are over 102 million compounds in PubChem, less than 300,000 curated electron ionization (EI) mass spectra are available from NIST or MoNA mass spectral databases. Here, we test quantum chemistry methods (QCEIMS) to generate in silico EI mass spectra (MS) by combining molecular dynamics (MD) with statistical methods. To test the accuracy of predictions, in silico mass spectra of 451 small molecules were generated and compared to experimental spectra from the NIST 17 mass spectral library. The compounds covered 43 chemical classes, ranging up to 358 Da. Organic oxygen compounds had a lower matching accuracy, while computation time exponentially increased with molecular size. The parameter space was probed to increase prediction accuracy including initial temperatures, the number of MD trajectories and impact excess energy (IEE). Conformational flexibility was not correlated to the accuracy of predictions. Overall, QCEIMS can predict 70 eV electron ionization spectra of chemicals from first principles. Improved methods to calculate potential energy surfaces (PES) are still needed before QCEIMS mass spectra of novel molecules can be generated at large scale.
Project description:Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15 i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.
Project description:Six synthesized 6-nitroquipazine derivatives were examined by electron ionization (EI) and electrospray ionization (ESI) mass spectrometry in positive and negative ion mode. The compounds exhibit high affinity for the serotonin transporter (SERT) and belong to a new class of SERT inhibitors. The EI mass spectra registered in negative ion mode showed prominent molecular ions for all the compounds studied. All EI mass spectra and all ESI mass spectra showed similar fragmentation pathways of molecular ions, but the pathways differed between EI and ESI. The differences were explained with the aid of theoretical evaluation of the stability of the respective radical ions (EI MS) and protonated ions (ESI MS).
Project description:A series of N,N-disubstituted piperazines were synthesized containing the structural elements of both methylenedioxybenzylpiperazine (MDBP) and trifluoromethylphenylpiperazine (TFMPP) in a single molecule. These six potential designer drug molecules having a regioisomeric relationship were compared in gas chromatography-mass spectrometry (GC-MS), gas chromatography-infrared spectroscopy and serotonin receptor affinity studies. These compounds were separated by capillary gas chromatography on an Rxi®-17Sil MS stationary phase film and the elution order appears to be determined by the position of aromatic ring substitution. The majority of electron ionization mass spectral fragment ions occur via processes initiated by one of the two nitrogen atoms of the piperazine ring. The major electron ionization mass spectrometry (EI-MS) fragment ions observed in all six of these regioisomeric substances occur at m/z = 364, 229, 163 and 135. The relative intensity of the various fragment ions is also equivalent in each of the six EI-MS spectra. The vapour phase infrared spectra provide a number of absorption bands to differentiate among the six individual compounds on this regioisomeric set. Thus, the mass spectra place these compounds into a single group and the vapour phase infrared spectra differentiate among the six regioisomeric possibilities. All of the TFMPP-MDBP regioisomers displayed significant binding to 5-HT2B receptors and in contrast to 3-TFMPP, most of these TFMPP-MDBP isomers did not show significant binding at 5-HT1 receptor subtypes. Only the 3-TFMPP-3,4-MDBP (Compound 5) isomer displayed affinity comparable to 3-TFMPP at 5-HT1A receptors (Ki = 188 nmol/L).
Project description:The analytical capabilities associated with the use of silylation reactions have been extended to a new class of organic molecules, nitroaromatic compounds (NACs). These compounds are a possible contributor to urban particulate matter of secondary origin which would make them important analytes due to their (1) detrimental health effects, (2) potential to affect aerosol optical properties, and (3) and usefulness for identifying PM<sub>2.5</sub> from biomass burning. The technique is based on derivatization of the parent NACs by using N,O-bis-(trimethylsilyl)-trifluoro acetamide, one of the most prevalent derivatization reagent for analyzing hydroxylated molecules, followed by gas chromatography-mass spectrometry using electron ionization (EI) and methane chemical ionization (CI). This method is evaluated for 32 NACs including nitrophenols, methyl-/methoxy-nitrophenols, nitrobenzoic acids, and nitrobenzyl alcohols. Electron ionization spectra were characterized by a high abundance of ions corresponding to [M<sup>+</sup> ] or [M<sup>+</sup> - 15]. Chemical ionization spectra exhibited high abundance for [M<sup>+</sup> + 1], [M<sup>+</sup> - 15], and [M<sup>+</sup> + 29] ions. Both EI and CI spectra exhibit ions specific to nitro group(s) for [M<sup>+</sup> - 31], [M<sup>+</sup> - 45], and [M<sup>+</sup> - 60]. The strong abundance observed for [M<sup>+</sup> ] (EI), [M<sup>+</sup> - 15] (EI/CI), or [M<sup>+</sup> + 1] (CI) ions is consistent with the high charge stabilizing ability associated with aromatic compounds. The combination of EI and CI ionization offers strong capabilities for detection and identification of NACs. Spectra associated with NACs, containing hydrogen, carbon, oxygen, and nitrogen atoms only, as silylated derivatives show fragment/adduct ions at either (a) odd or (b) even masses that indicate either (a) odd or (b) even number of nitro groups, respectively. Mass spectra associated with silylated NACs exhibited 3 distinct regions where characteristic fragmentation with a specific pattern associated with (1) ?OH and/or ?COOH groups, (2) ?NO<sub>2</sub> group(s), and (3) benzene ring(s). These findings were confirmed with applications to chamber aerosol and ambient PM<sub>2.5</sub> .
Project description:Compound identification is a key component of data analysis in the applications of gas chromatography-mass spectrometry (GC-MS). Currently, the most widely used compound identification is mass spectrum matching, in which the dot product and its composite version are employed as spectral similarity measures. Several forms of transformations for fragment ion intensities have also been proposed to increase the accuracy of compound identification. In this study, we introduced partial and semipartial correlations as mass spectral similarity measures and applied them to identify compounds along with different transformations of peak intensity. The mixture versions of the proposed method were also developed to further improve the accuracy of compound identification. To demonstrate the performance of the proposed spectral similarity measures, the National Institute of Standards and Technology (NIST) mass spectral library and replicate spectral library were used as the reference library and the query spectra, respectively. Identification results showed that the mixture partial and semipartial correlations always outperform both the dot product and its composite measure. The mixture similarity with semipartial correlation has the highest accuracy of 84.6% in compound identification with a transformation of (0.53,1.3) for fragment ion intensity and m/z value, respectively.
Project description:The high-throughput gas chromatography/mass spectrometry (GC/MS) technology offers a powerful means of analyzing a large number of chemical and biological samples. One of the important analyses of GC/MS data is compound identification. In this work, novel spectral similarity measures based on the discrete wavelet and Fourier transforms were proposed. The proposed methods are composite similarities that are composed of weighted intensities and wavelet/Fourier coefficients using cosine correlation. The performance of the proposed approaches along with the existing similarity measures was evaluated using the NIST Chemistry WebBook mass database maintained by the National Institute of Standards and Technology (NIST) as a library of reference spectra and repetitive mass spectral data as query spectra. The analysis results showed that the identification accuracies of the wavelet- and Fourier-transform-based methods were improved by 2.02% and 1.95%, respectively, compared to that of the weighted dot product (cosine correlation) and by 3.01% and 3.08%, respectively, compared to that of the composite similarity measure. The improved identification accuracy demonstrates that the proposed approaches outperformed the existing similarity measures in the literature.
Project description:The "Critical Assessment of Small Molecule Identification" (CASMI) contest was aimed in testing strategies for small molecule identification that are currently available in the experimental and computational mass spectrometry community. We have applied tandem mass spectral library search to solve Category 2 of the CASMI Challenge 2012 (best identification for high resolution LC/MS data). More than 230,000 tandem mass spectra part of four well established libraries (MassBank, the collection of tandem mass spectra of the "NIST/NIH/EPA Mass Spectral Library 2012", METLIN, and the 'Wiley Registry of Tandem Mass Spectral Data, MSforID') were searched. The sample spectra acquired in positive ion mode were processed. Seven out of 12 challenges did not produce putative positive matches, simply because reference spectra were not available for the compounds searched. This suggests that to some extent the limited coverage of chemical space with high-quality reference spectra is still a problem encountered in tandem mass spectral library search. Solutions were submitted for five challenges. Three compounds were correctly identified (kanamycin A, benzyldiphenylphosphine oxide, and 1-isopropyl-5-methyl-1H-indole-2,3-dione). In the absence of any reference spectrum, a false positive identification was obtained for 1-aminoanthraquinone by matching the corresponding sample spectrum to the structurally related compounds N-phenylphthalimide and 2-aminoanthraquinone. Another false positive result was submitted for 1H-benz[g]indole; for the 1H-benz[g]indole-specific sample spectra provided, carbazole was listed as the best matching compound. In this case, the quality of the available 1H-benz[g]indole-specific reference spectra was found to hamper unequivocal identification.
Project description:Liquid chromatography-coulometric array detection (LC-EC) is a sensitive, quantitative, and robust metabolomics profiling tool that complements the commonly used mass spectrometry (MS) and nuclear magnetic resonance (NMR)-based approaches. However, LC-EC provides little structural information. We recently demonstrated a workflow for the structural characterization of metabolites detected by LC-EC profiling combined with LC-electrospray ionization (ESI)-MS and microNMR. This methodology is now extended to include (i) gas chromatography (GC)-electron ionization (EI)-MS analysis to fill structural gaps left by LC-ESI-MS and NMR and (ii) secondary fractionation of LC-collected fractions containing multiple coeluting analytes. GC-EI-MS spectra have more informative fragment ions that are reproducible for database searches. Secondary fractionation provides enhanced metabolite characterization by reducing spectral overlap in NMR and ion suppression in LC-ESI-MS. The need for these additional methods in the analysis of the broad chemical classes and concentration ranges found in plasma is illustrated with discussion of four specific examples: (i) characterization of compounds for which one or more of the detectors is insensitive (e.g., positional isomers in LC-MS, the direct detection of carboxylic groups and sulfonic groups in (1)H NMR, or nonvolatile species in GC-MS), (ii) detection of labile compounds, (iii) resolution of closely eluting and/or coeluting compounds, and (iv) the capability to harness structural similarities common in many biologically related, LC-EC-detectable compounds.
Project description:A large fraction of ions observed in electrospray liquid chromatography-mass spectrometry (LC-ESI-MS) experiments of biological samples remain unidentified. One of the main reasons for this is that spectral libraries of pure compounds fail to account for the complexity of the metabolite profiling of complex materials. Recently, the NIST Mass Spectrometry Data Center has been developing a novel type of searchable mass spectral library that includes all recurrent unidentified spectra found in the sample profile. These libraries, in conjunction with the NIST tandem mass spectral library, allow analysts to explore most of the chemical space accessible to LC-MS analysis. In this work, we demonstrate how these libraries can provide a reliable fingerprint of the material by applying them to a variety of urine samples, including an extremely altered urine from cancer patients undergoing total body irradiation. The same workflow is applicable to any other biological fluid. The selected class of acylcarnitines is examined in detail, and derived libraries and related software are freely available. They are intended to serve as online resources for continuing community review and improvement.
Project description:Many metabolomic applications use gas chromatography/mass spectrometry (GC/MS) under standard 70 eV electron ionization (EI) parameters. However, the abundance of molecular ions is often extremely low, impeding the calculation of elemental compositions for the identification of unknown compounds. On changing the beam-steering voltage of the ion source, the relative abundances of molecular ions at 70 eV EI were increased up to ten-fold for alkanes, fatty acid methyl esters and trimethylsilylated metabolites, concomitant with 2-fold absolute increases in ion intensities. We have compared the abundance, mass accuracy and isotope ratio accuracy of molecular species in EI with those in chemical ionization (CI) with methane as reagent gas under high-mass tuning. Thirty-three peaks of a diverse set of trimethylsilylated metabolites were analyzed in triplicate, resulting in 342 ion species ([M+H](+), [M-CH(3)](+) for CI and [M](+.), [M-CH(3)](+.) for EI). On average, CI yielded 8-fold more intense molecular species than EI. Using internal recalibration, average mass errors of 1.8 +/- 1.6 mm/z units and isotope ratio errors of 2.3 +/- 2.0% (A+1/A ratio) and 1.7 +/- 1.8% (A+2/A ratio) were obtained. When constraining lists of calculated elemental compositions by chemical and heuristic rules using the Seven Golden Rules algorithm and PubChem queries, the correct formula was retrieved as top hit in 60% of the cases and within the top-3 hits in 80% of the cases.