Q-DFTNet: A chemistry-informed neural network framework for predicting molecular dipole moments via DFT-Driven QM9 data

This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 e...

Full description

Saved in:
Bibliographic Details
Main Authors: Wayo, Dennis Delali Kwesi, Mohd Zulkifli, Mohamad Noor, Ganji, Masoud Darvish, Saporetti, Camila M., Goliatt, Leonardo
Format: Article
Language:en
Published: John Wiley and Sons Inc. 2025
Subjects:
Online Access:https://umpir.ump.edu.my/id/eprint/47149/1/Q-DFTNet_A%20chemistry-informed%20neural%20network%20framework.pdf
https://doi.org/10.1002/jcc.70206
https://umpir.ump.edu.my/id/eprint/47149/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 epochs and evaluated across performance and interpretability metrics. GraphConv achieved the lowest test MSE (0.7054), MAE (0.6196), and the highest R2 (0.6513) with only 16.5k trainable parameters, confirming its optimal accuracy-complexity trade-off. GIN+EdgeConv followed closely with MSE of 0.7386, MAE of 0.6332, and R2 of 0.6349, leveraging edge-awareness for enhanced expressivity. In contrast, attention-based models like GATConv and GATNet underperformed, with test MSEs of 0.9667 and 1.0096, and R2 values of 0.5221 and 0.5009, despite their higher complexity (43.5k and 37.3k parameters). Latent space analysis via t-SNE, PCA, and UMAP showed superior cluster separability for GraphConv, GIN+EdgeConv, and GCN. Clustering metrics corroborated these observations: GraphConv achieved a Silhouette Score of 0.4665, a Davies–Bouldin Index of 0.7111, and a Calinski–Harabasz Score of 1278.40. Cluster-wise molecular dipole means for GIN+EdgeConv ranged from 2.6221 to 2.9606 Debye, reflecting high semantic coherence. Residual analysis and QQ plots confirmed that models with lower MSEs also had near-Gaussian error distributions, enhancing interpretability.