Metabolomics

Dataset Information

0

Employing fingerprinting of medicinal plants by means of LC-MS and machine learning for species identification task


ABSTRACT:

A dataset of liquid chromatography-mass spectrometry measurements of medicinal plant extracts from 76 species was generated and used for training and validating plant species identification algorithms. Various strategies for data handling and feature space selection were tested. Constrained Tucker decomposition, large-scale (more than 1500 variables) discrete Bayesian Networks and autoencoder based dimensionality reduction coupled with continuous Bayes classifier and logistic regression were optimized to achieve the best accuracy. Classification algorithms based on Tucker decomposition of original data and logistic regression on representation learned with autoencoder showed identification accuracy of up to 96%, outperforming various implementations of Bayesian Networks. Benefits and drawbacks of used approaches were discussed. Tolerance to changes in data created by using different extraction methods and equipment was tentatively tested.


Main study is reported in the current study MTBLS688

Helianthus tuberosus assay is reported in MTBLS759

INSTRUMENT(S): Liquid Chromatography MS - Alternating (LC-MS (Alternating))

SUBMITTER: Dmitry Nazarenko 

PROVIDER: MTBLS688 | MetaboLights | 2022-04-15

REPOSITORIES: MetaboLights

altmetric image

Publications

Employing fingerprinting of medicinal plants by means of LC-MS and machine learning for species identification task.

Kharyuk Pavel P   Nazarenko Dmitry D   Oseledets Ivan I   Rodin Igor I   Shpigun Oleg O   Tsitsilin Andrey A   Lavrentyev Mikhail M  

Scientific reports 20181119 1


A dataset of liquid chromatography-mass spectrometry measurements of medicinal plant extracts from 74 species was generated and used for training and validating plant species identification algorithms. Various strategies for data handling and feature space extraction were tested. Constrained Tucker decomposition, large-scale (more than 1500 variables) discrete Bayesian Networks and autoencoder based dimensionality reduction coupled with continuous Bayes classifier and logistic regression were op  ...[more]

Similar Datasets

2013-01-22 | E-GEOD-39040 | biostudies-arrayexpress
2013-01-22 | E-GEOD-39052 | biostudies-arrayexpress
2013-01-22 | E-GEOD-39055 | biostudies-arrayexpress
2013-01-22 | E-GEOD-39057 | biostudies-arrayexpress
2013-01-22 | GSE39040 | GEO
2013-01-22 | GSE39057 | GEO
2013-01-22 | GSE39055 | GEO
2013-01-22 | GSE39052 | GEO
2021-10-26 | ST002015 | MetabolomicsWorkbench
2022-04-04 | GSE199668 | GEO