Unknown

Dataset Information

0

Applying Large Graph Neural Networks to Predict Transition Metal Complex Energies Using the tmQM_wB97MV Data Set.


ABSTRACT: Machine learning (ML) methods have shown promise for discovering novel catalysts but are often restricted to specific chemical domains. Generalizable ML models require large and diverse training data sets, which exist for heterogeneous catalysis but not for homogeneous catalysis. The tmQM data set, which contains properties of 86,665 transition metal complexes calculated at the TPSSh/def2-SVP level of density functional theory (DFT), provided a promising training data set for homogeneous catalyst systems. However, we find that ML models trained on tmQM consistently underpredict the energies of a chemically distinct subset of the data. To address this, we present the tmQM_wB97MV data set, which filters out several structures in tmQM found to be missing hydrogens and recomputes the energies of all other structures at the ωB97M-V/def2-SVPD level of DFT. ML models trained on tmQM_wB97MV show no pattern of consistently incorrect predictions and much lower errors than those trained on tmQM. The ML models tested on tmQM_wB97MV were, from best to worst, GemNet-T > PaiNN ≈ SpinConv > SchNet. Performance consistently improves when using only neutral structures instead of the entire data set. However, while models saturate with only neutral structures, more data continue to improve the models when including charged species, indicating the importance of accurately capturing a range of oxidation states in future data generation and model development. Furthermore, a fine-tuning approach in which weights were initialized from models trained on OC20 led to drastic improvements in model performance, indicating transferability between ML strategies of heterogeneous and homogeneous systems.

SUBMITTER: Garrison AG 

PROVIDER: S-EPMC10751796 | biostudies-literature | 2023 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

Applying Large Graph Neural Networks to Predict Transition Metal Complex Energies Using the tmQM_wB97MV Data Set.

Garrison Aaron G AG   Heras-Domingo Javier J   Kitchin John R JR   Dos Passos Gomes Gabriel G   Ulissi Zachary W ZW   Blau Samuel M SM  

Journal of chemical information and modeling 20231204 24


Machine learning (ML) methods have shown promise for discovering novel catalysts but are often restricted to specific chemical domains. Generalizable ML models require large and diverse training data sets, which exist for heterogeneous catalysis but not for homogeneous catalysis. The tmQM data set, which contains properties of 86,665 transition metal complexes calculated at the TPSSh/def2-SVP level of density functional theory (DFT), provided a promising training data set for homogeneous catalys  ...[more]

Similar Datasets

| S-EPMC11904306 | biostudies-literature
| S-EPMC9044246 | biostudies-literature
| S-EPMC11522678 | biostudies-literature
| S-EPMC10319785 | biostudies-literature
| S-EPMC6100542 | biostudies-literature
| S-EPMC9683700 | biostudies-literature
| S-EPMC10024712 | biostudies-literature
| S-EPMC11447034 | biostudies-literature
| S-EPMC9714572 | biostudies-literature
| S-EPMC10311302 | biostudies-literature