Dataset Information


A large scale test dataset to determine optimal retention index threshold based on three mass spectral similarity measures.

ABSTRACT: Retention index (RI) is useful for metabolite identification. However, when RI is integrated with mass spectral similarity for metabolite identification, many controversial RI threshold setup are reported in literatures. In this study, a large scale test dataset of 5844 compounds with both mass spectra and RI information were created from National Institute of Standards and Technology (NIST) repetitive mass spectra (MS) and RI library. Three MS similarity measures: NIST composite measure, the real part of Discrete Fourier Transform (DFT.R) and the detail of Discrete Wavelet Transform (DWT.D) were used to investigate the accuracy of compound identification using the test dataset. To imitate real identification experiments, NIST MS main library was employed as reference library and the test dataset was used as search data. Our study shows that the optimal RI thresholds are 22, 15, and 15 i.u. for the NIST composite, DFT.R and DWT.D measures, respectively, when the RI and mass spectral similarity are integrated for compound identification. Compared to the mass spectrum matching, using both RI and mass spectral matching can improve the identification accuracy by 1.7%, 3.5%, and 3.5% for the three mass spectral similarity measures, respectively. It is concluded that the improvement of RI matching for compound identification heavily depends on the method of MS spectral similarity measure and the accuracy of RI data.


PROVIDER: S-EPMC3430127 | BioStudies | 2012-01-01

REPOSITORIES: biostudies

Similar Datasets

2012-01-01 | S-EPMC3418476 | BioStudies
2014-01-01 | S-EPMC4308046 | BioStudies
2011-01-01 | S-EPMC3136582 | BioStudies
2011-01-01 | S-EPMC3146571 | BioStudies
2013-01-01 | S-EPMC3901269 | BioStudies
2013-01-01 | S-EPMC3686837 | BioStudies
2020-01-01 | S-EPMC7576811 | BioStudies
2021-01-01 | S-EPMC7909622 | BioStudies
2013-01-01 | S-EPMC3787630 | BioStudies
1000-01-01 | S-EPMC3324511 | BioStudies