Transferable and Transparent Energy Decomposition-Based Machine Learning Models for Computing Accurate Reaction Energetics

Jacinto-Mejía, Carlos R.; Storchi, Loriano; Bistoni, Giovanni

doi:10.1021/acs.jctc.5c01184

We present a transferable, interpretable, and modular machine-learning framework that enhances the accuracy of density functional theory (DFT) reaction energies using physically meaningful energy-decomposition descriptors. Reaction energies computed at the DFT level with standard basis sets are first decomposed into chemically intuitive contributions─such as kinetic and potential energy─which are then used to train a library of linear regression (LR) models. This includes a general-purpose model that reduces mean absolute percentage errors (MAPE) relative to gold standard CCSD(T)/CBS reference values by up to 63% compared to uncorrected DFT across extended benchmark sets. In parallel, a series of specialized LR models provide improved accuracy for specific reaction classes. A random forest (RF) classifier dynamically selects the optimal model for each case, pushing accuracy further and achieving a MAPE reduction of up to 123 percentage points, all while maintaining full model interpretability. In a rigorous out-of-distribution stress test on the WCCR10 data set─containing transition-metal complexes absent from training─both the general LR model and the RF/LR pipeline retain robust performance. Unlike typical neural network models, which often face generalization challenges beyond their training set, our framework maintains stable performance outside its training domain.