Unknown

Dataset Information

0

67 million natural product-like compound database generated via molecular language processing.


ABSTRACT: Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. For natural product discovery, high throughput in silico screening offers a cost-effective alternative to traditional resource-heavy assay-guided exploration of structurally novel chemical space. In this data descriptor, we report a characterized database of 67,064,204 natural product-like molecules generated using a recurrent neural network trained on known natural products, demonstrating a significant 165-fold expansion in library size over the approximately 400,000 known natural products. This study highlights the potential of using deep generative models to explore novel natural product chemical space for high throughput in silico discovery.

SUBMITTER: Tay DWP 

PROVIDER: S-EPMC10199072 | biostudies-literature | 2023 May

REPOSITORIES: biostudies-literature

altmetric image

Publications

67 million natural product-like compound database generated via molecular language processing.

Tay Dillon W P DWP   Yeo Naythan Z X NZX   Adaikkappan Krishnan K   Lim Yee Hwee YH   Ang Shi Jun SJ  

Scientific data 20230519 1


Natural products are a rich resource of bioactive compounds for valuable applications across multiple fields such as food, agriculture, and medicine. For natural product discovery, high throughput in silico screening offers a cost-effective alternative to traditional resource-heavy assay-guided exploration of structurally novel chemical space. In this data descriptor, we report a characterized database of 67,064,204 natural product-like molecules generated using a recurrent neural network traine  ...[more]

Similar Datasets

| S-EPMC10168555 | biostudies-literature
| S-EPMC11461638 | biostudies-literature
| S-EPMC9584511 | biostudies-literature
| S-EPMC10151863 | biostudies-literature
| S-EPMC5954283 | biostudies-literature
| S-EPMC8985928 | biostudies-literature
| S-EPMC9653489 | biostudies-literature
| S-EPMC6828779 | biostudies-literature
| S-EPMC6230812 | biostudies-literature
| S-EPMC11494223 | biostudies-literature