Unknown

Dataset Information

0

The METLIN small molecule dataset for machine learning-based retention time prediction.


ABSTRACT: Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70[Formula: see text] of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.

SUBMITTER: Domingo-Almenara X 

PROVIDER: S-EPMC6925099 | biostudies-literature | 2019 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

The METLIN small molecule dataset for machine learning-based retention time prediction.

Domingo-Almenara Xavier X   Guijas Carlos C   Billings Elizabeth E   Montenegro-Burke J Rafael JR   Uritboonthai Winnie W   Aisporna Aries E AE   Chen Emily E   Benton H Paul HP   Siuzdak Gary G  

Nature communications 20191220 1


Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset coverin  ...[more]

Similar Datasets

| S-EPMC9024754 | biostudies-literature
2022-05-26 | MTBLS2841 | MetaboLights
| S-EPMC10826801 | biostudies-literature
| S-EPMC10998092 | biostudies-literature
| S-EPMC10113922 | biostudies-literature
2013-01-01 | E-GEOD-29210 | biostudies-arrayexpress
| S-EPMC10101485 | biostudies-literature
| S-EPMC10611213 | biostudies-literature
| S-EPMC9403805 | biostudies-literature
2025-06-14 | GSE206127 | GEO