{"database":"biostudies-literature","file_versions":[],"scores":null,"additional":{"submitter":["Capecchi A"],"funding":["European Research Council","schweizerischer nationalfonds zur förderung der wissenschaftlichen forschung"],"pagination":["82"],"full_dataset_link":["https://www.ebi.ac.uk/biostudies/studies/S-EPMC8524952"],"repository":["biostudies-literature"],"omics_type":["Unknown"],"volume":["13(1)"],"pubmed_abstract":["Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/ ), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/ ). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms."],"journal":["Journal of cheminformatics"],"pubmed_title":["Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning."],"pmcid":["PMC8524952"],"funding_grant_id":["885076","200020_178998"],"pubmed_authors":["Reymond JL","Capecchi A"],"additional_accession":[]},"is_claimable":false,"name":"Classifying natural products from plants, fungi or bacteria using the COCONUT database and machine learning.","description":"Natural products (NPs) represent one of the most important resources for discovering new drugs. Here we asked whether NP origin can be assigned from their molecular structure in a subset of 60,171 NPs in the recently reported Collection of Open Natural Products (COCONUT) database assigned to plants, fungi, or bacteria. Visualizing this subset in an interactive tree-map (TMAP) calculated using MAP4 (MinHashed atom pair fingerprint) clustered NPs according to their assigned origin ( https://tm.gdb.tools/map4/coconut_tmap/ ), and a support vector machine (SVM) trained with MAP4 correctly assigned the origin for 94% of plant, 89% of fungal, and 89% of bacterial NPs in this subset. An online tool based on an SVM trained with the entire subset correctly assigned the origin of further NPs with similar performance ( https://np-svm-map4.gdb.tools/ ). Origin information might be useful when searching for biosynthetic genes of NPs isolated from plants but produced by endophytic microorganisms.","dates":{"release":"2021-01-01T00:00:00Z","publication":"2021 Oct","modification":"2024-10-19T04:38:27.624Z","creation":"2024-10-19T04:38:27.624Z"},"accession":"S-EPMC8524952","cross_references":{"pubmed":["34663470"],"doi":["10.1186/s13321-021-00559-3"]}}