Unknown

Dataset Information

0

Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search.


ABSTRACT: Computer-aided synthesis planning (CASP) aims to automatically learn organic reactivity from literature and perform retrosynthesis of unseen molecules. CASP systems must learn reactions sufficiently precisely to propose realistic disconnections, while avoiding overfitting to leave room for diverse options, and explore possible routes such as to allow short synthetic sequences to emerge. Herein we report an open-source CASP tool proposing original solutions to both challenges. First, we use a triple transformer loop (TTL) predicting starting materials (T1), reagents (T2), and products (T3) to explore various disconnection sites defined by combining systematic, template-based, and transformer-based tagging procedures. Second, we integrate TTL into a multistep tree search algorithm (TTLA) prioritizing sequences using a route penalty score (RPScore) considering the number of steps, their confidence score, and the simplicity of all intermediates along the route. Our approach favours short synthetic routes to commercial starting materials, as exemplified by retrosynthetic analyses of recently approved drugs.

SUBMITTER: Kreutter D 

PROVIDER: S-EPMC10510629 | biostudies-literature | 2023 Sep

REPOSITORIES: biostudies-literature

altmetric image

Publications

Multistep retrosynthesis combining a disconnection aware triple transformer loop with a route penalty score guided tree search.

Kreutter David D   Reymond Jean-Louis JL  

Chemical science 20230901 36


Computer-aided synthesis planning (CASP) aims to automatically learn organic reactivity from literature and perform retrosynthesis of unseen molecules. CASP systems must learn reactions sufficiently precisely to propose realistic disconnections, while avoiding overfitting to leave room for diverse options, and explore possible routes such as to allow short synthetic sequences to emerge. Herein we report an open-source CASP tool proposing original solutions to both challenges. First, we use a tri  ...[more]

Similar Datasets

| S-EPMC11474389 | biostudies-literature
| S-EPMC10390024 | biostudies-literature
| S-EPMC11869726 | biostudies-literature
| S-EPMC8965881 | biostudies-literature
| S-EPMC7643129 | biostudies-literature
| S-EPMC11520410 | biostudies-literature
| S-EPMC10892138 | biostudies-literature
| S-EPMC9906956 | biostudies-literature
| S-EPMC10716893 | biostudies-literature
| S-EPMC11919449 | biostudies-literature