Unknown

Dataset Information

0

Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning.


ABSTRACT: Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/ ), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/ ). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms.

SUBMITTER: Capecchi A 

PROVIDER: S-EPMC8524952 | biostudies-literature | 2021 Oct

REPOSITORIES: biostudies-literature

altmetric image

Publications

Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning.

Capecchi Alice A   Reymond Jean-Louis JL  

Journal of cheminformatics 20211018 1


Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb  ...[more]

Similar Datasets

| S-EPMC9965265 | biostudies-literature
| S-EPMC5289882 | biostudies-literature
| S-EPMC10991582 | biostudies-literature
| S-EPMC9290337 | biostudies-literature
| S-EPMC10962494 | biostudies-literature
| S-EPMC9920077 | biostudies-literature
| S-EPMC8728154 | biostudies-literature
| S-EPMC5290640 | biostudies-literature
| S-EPMC3877901 | biostudies-literature
| S-EPMC9653489 | biostudies-literature