Unknown

Dataset Information

0

A large-scale reaction dataset of mechanistic pathways of organic reactions.


ABSTRACT: Understanding organic reaction mechanisms is crucial for interpreting the formation of products at the atomic and electronic level, but still remains as a domain of knowledgeable experts. The lack of a large-scale dataset with chemically reasonable mechanistic sequences also hinders the development of reliable machine learning models to predict organic reactions based on mechanisms as human chemists do. Here, we present a high-quality and the first large-scale reaction dataset, denoted as mech-USPTO-31K, with chemically reasonable arrow-pushing diagrams validated by synthetic chemists, encompassing a wide spectrum of polar organic reaction mechanisms. We envision this dataset curated by applying a simple and flexible method that automatically generates reaction mechanisms using autonomously extracted reaction templates and expert-coded mechanistic templates to become an invaluable tool to develop future reaction outcome prediction models and discover new reactions.

SUBMITTER: Chen S 

PROVIDER: S-EPMC11316731 | biostudies-literature | 2024 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

A large-scale reaction dataset of mechanistic pathways of organic reactions.

Chen Shuan S   Babazade Ramil R   Kim Taewan T   Han Sunkyu S   Jung Yousung Y  

Scientific data 20240810 1


Understanding organic reaction mechanisms is crucial for interpreting the formation of products at the atomic and electronic level, but still remains as a domain of knowledgeable experts. The lack of a large-scale dataset with chemically reasonable mechanistic sequences also hinders the development of reliable machine learning models to predict organic reactions based on mechanisms as human chemists do. Here, we present a high-quality and the first large-scale reaction dataset, denoted as mech-U  ...[more]

Similar Datasets

| S-EPMC7217853 | biostudies-literature
| S-EPMC7334672 | biostudies-literature
2002-12-06 | E-GEOD-96 | biostudies-arrayexpress
| S-EPMC10300118 | biostudies-literature
| S-EPMC8369168 | biostudies-literature
| S-EPMC6715739 | biostudies-literature
| S-EPMC9937663 | biostudies-literature
| S-EPMC10280557 | biostudies-literature
| S-EPMC6206617 | biostudies-literature
| S-EPMC7101442 | biostudies-literature