<HashMap><database>biostudies-literature</database><scores/><additional><submitter>Capecchi A</submitter><funding>European Research Council</funding><funding>schweizerischer nationalfonds zur förderung der wissenschaftlichen forschung</funding><pagination>82</pagination><full_dataset_link>https://www.ebi.ac.uk/biostudies/studies/S-EPMC8524952</full_dataset_link><repository>biostudies-literature</repository><omics_type>Unknown</omics_type><volume>13(1)</volume><pubmed_abstract>Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/ ), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/ ). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms.</pubmed_abstract><journal>Journal of cheminformatics</journal><pubmed_title>Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning.</pubmed_title><pmcid>PMC8524952</pmcid><funding_grant_id>885076</funding_grant_id><funding_grant_id>200020_178998</funding_grant_id><pubmed_authors>Reymond JL</pubmed_authors><pubmed_authors>Capecchi A</pubmed_authors></additional><is_claimable>false</is_claimable><name>Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning.</name><description>Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/ ), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/ ). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms.</description><dates><release>2021-01-01T00:00:00Z</release><publication>2021 Oct</publication><modification>2024-10-19T04:38:27.624Z</modification><creation>2024-10-19T04:38:27.624Z</creation></dates><accession>S-EPMC8524952</accession><cross_references><pubmed>34663470</pubmed><doi>10.1186/s13321-021-00559-3</doi></cross_references></HashMap>