We present a transferable, interpretable, and modular machine-learning framework that enhances the accuracy of density functional theory (DFT) reaction energies using physically meaningful energy-decomposition descriptors. Reaction energies computed at the DFT level with standard basis sets are first decomposed into chemically intuitive contributions─such as kinetic and potential energy─which are then used to train a library of linear regression (LR) models. This includes a general-purpose model that reduces mean absolute percentage errors (MAPE) relative to gold standard CCSD(T)/CBS reference values by up to 63% compared to uncorrected DFT across extended benchmark sets. In parallel, a series of specialized LR models provide improved accuracy for specific reaction classes. A random forest (RF) classifier dynamically selects the optimal model for each case, pushing accuracy further and achieving a MAPE reduction of up to 123 percentage points, all while maintaining full model interpretability. In a rigorous out-of-distribution stress test on the WCCR10 data set─containing transition-metal complexes absent from training─both the general LR model and the RF/LR pipeline retain robust performance. Unlike typical neural network models, which often face generalization challenges beyond their training set, our framework maintains stable performance outside its training domain.

Transferable and Transparent Energy Decomposition-Based Machine Learning Models for Computing Accurate Reaction Energetics

Storchi, Loriano
;
2025-01-01

Abstract

We present a transferable, interpretable, and modular machine-learning framework that enhances the accuracy of density functional theory (DFT) reaction energies using physically meaningful energy-decomposition descriptors. Reaction energies computed at the DFT level with standard basis sets are first decomposed into chemically intuitive contributions─such as kinetic and potential energy─which are then used to train a library of linear regression (LR) models. This includes a general-purpose model that reduces mean absolute percentage errors (MAPE) relative to gold standard CCSD(T)/CBS reference values by up to 63% compared to uncorrected DFT across extended benchmark sets. In parallel, a series of specialized LR models provide improved accuracy for specific reaction classes. A random forest (RF) classifier dynamically selects the optimal model for each case, pushing accuracy further and achieving a MAPE reduction of up to 123 percentage points, all while maintaining full model interpretability. In a rigorous out-of-distribution stress test on the WCCR10 data set─containing transition-metal complexes absent from training─both the general LR model and the RF/LR pipeline retain robust performance. Unlike typical neural network models, which often face generalization challenges beyond their training set, our framework maintains stable performance outside its training domain.
2025
Inglese
21
21
10853
10862
10
no
3
info:eu-repo/semantics/article
262
Jacinto-Mejía, Carlos R.; Storchi, Loriano; Bistoni, Giovanni
1 Contributo su Rivista::1.1 Articolo in rivista
none
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/886574
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact