Project description:Accurately predicting activation energies is crucial for understanding chemical reactions and modeling complex reaction systems. However, the high computational cost of quantum chemistry methods often limits the feasibility of large-scale studies, leading to a scarcity of high-quality activation energy data. In this work, we explore and compare three innovative approaches (transfer learning, delta learning, and feature engineering) to enhance the accuracy of activation energy predictions using graph neural networks, specifically focusing on methods that incorporate low-cost, low-level computational data. Using the Chemprop model, we systematically evaluated how these methods leverage data from semiempirical quantum mechanics (SQM) calculations to improve predictions. Delta learning, which adjusts low-level SQM activation energies to align with high-level CCSD(T)-F12a targets, emerged as the most effective method, achieving high accuracy with substantially reduced data requirements. Notably, delta learning trained with just 20-30% of high-level data matched or exceeded the performance of other methods trained with full data sets, making it advantageous in data-scarce scenarios. However, its reliance on transition state searches imposes significant computational demands during model application. Transfer learning, which pretrains models on large data sets of low-level data, provided mixed results, particularly when there was a mismatch in the reaction distributions between the training and target data sets. Feature engineering, which involves adding computed molecular properties as input features, showed modest gains, particularly in thermodynamic properties. Our study highlights the trade-offs between accuracy and computational demand in selecting the best approach for enhancing activation energy predictions. These insights provide valuable guidelines for researchers aiming to apply machine learning in chemical reaction engineering, helping to balance accuracy with resource constraints.

Project description:Conventional analysis of fluorescence recovery after photobleaching (FRAP) data for diffusion coefficient estimation typically involves fitting an analytical or numerical FRAP model to the recovery curve data using non-linear least squares. Depending on the model, this can be time consuming, especially for batch analysis of large numbers of data sets and if multiple initial guesses for the parameter vector are used to ensure convergence. In this work, we develop a completely new approach, DeepFRAP, utilizing machine learning for parameter estimation in FRAP. From a numerical FRAP model developed in previous work, we generate a very large set of simulated recovery curve data with realistic noise levels. The data are used for training different deep neural network regression models for prediction of several parameters, most importantly the diffusion coefficient. The neural networks are extremely fast and can estimate the parameters orders of magnitude faster than least squares. The performance of the neural network estimation framework is compared to conventional least squares estimation on simulated data, and found to be strikingly similar. Also, a simple experimental validation is performed, demonstrating excellent agreement between the two methods. We make the data and code used publicly available to facilitate further development of machine learning-based estimation in FRAP. LAY DESCRIPTION: Fluorescence recovery after photobleaching (FRAP) is one of the most frequently used methods for microscopy-based diffusion measurements and broadly used in materials science, pharmaceutics, food science and cell biology. In a FRAP experiment, a laser is used to photobleach fluorescent particles in a region. By analysing the recovery of the fluorescence intensity due to the diffusion of still fluorescent particles, the diffusion coefficient and other parameters can be estimated. Typically, a confocal laser scanning microscope (CLSM) is used to image the time evolution of the recovery, and a model is fit using least squares to obtain parameter estimates. In this work, we introduce a new, fast and accurate method for analysis of data from FRAP. The new method is based on using artificial neural networks to predict parameter values, such as the diffusion coefficient, effectively circumventing classical least squares fitting. This leads to a dramatic speed-up, especially noticeable when analysing large numbers of FRAP data sets, while still producing results in excellent agreement with least squares. Further, the neural network estimates can be used as very good initial guesses for least squares estimation in order to make the least squares optimization convergence much faster than it otherwise would. This provides for obtaining, for example, diffusion coefficients as soon as possible, spending minimal time on data analysis. In this fashion, the proposed method facilitates efficient use of the experimentalist's time which is the main motivation to our approach. The concept is demonstrated on pure diffusion. However, the concept can easily be extended to the diffusion and binding case. The concept is likely to be useful in all application areas of FRAP, including diffusion in cells, gels and solutions.

Dataset Information

The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets