Unknown

Dataset Information

0

Improving the performance of models for one-step retrosynthesis through re-ranking.


ABSTRACT: Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for improvement. This work aims to enhance these models by learning to re-rank their reactant predictions. Specifically, we design and train an energy-based model to re-rank, for each product, the published reaction as the top suggestion and the remaining reactant predictions as lower-ranked. We show that re-ranking can improve one-step models significantly using the standard USPTO-50k benchmark dataset, such as RetroSim, a similarity-based method, from 35.7 to 51.8% top-1 accuracy and NeuralSym, a deep learning method, from 45.7 to 51.3%, and also that re-ranking the union of two models' suggestions can lead to better performance than either alone. However, the state-of-the-art top-1 accuracy is not improved by this method.

SUBMITTER: Lin MH 

PROVIDER: S-EPMC8922884 | biostudies-literature | 2022 Mar

REPOSITORIES: biostudies-literature

altmetric image

Publications

Improving the performance of models for one-step retrosynthesis through re-ranking.

Lin Min Htoo MH   Tu Zhengkai Z   Coley Connor W CW  

Journal of cheminformatics 20220315 1


Retrosynthesis is at the core of organic chemistry. Recently, the rapid growth of artificial intelligence (AI) has spurred a variety of novel machine learning approaches for data-driven synthesis planning. These methods learn complex patterns from reaction databases in order to predict, for a given product, sets of reactants that can be used to synthesise that product. However, their performance as measured by the top-N accuracy in matching published reaction precedents still leaves room for imp  ...[more]

Similar Datasets

| S-EPMC10249296 | biostudies-literature
| S-EPMC7643129 | biostudies-literature
| S-EPMC10229662 | biostudies-literature
| S-EPMC11869726 | biostudies-literature
| S-EPMC3933871 | biostudies-literature
| S-EPMC10147675 | biostudies-literature
| S-EPMC11742932 | biostudies-literature
| S-EPMC10390024 | biostudies-literature
| S-EPMC2759727 | biostudies-literature
| S-EPMC3645569 | biostudies-literature