Successfully deploying graph neural networks (GNNs) in practical applications, particularly within fields like healthcare and sensor technology, requires effective handling of missing node features. Researchers Francesco Ferrini, Veronica Lachi, and Antonio Longa, from the University of Trento, alongside Bruno Lepri and Matono Akiyoshi from Fondazione Bruno Kessler and AIST Tokyo respectively, have identified a critical gap in current GNN evaluation. Their work challenges the assumption that existing models are genuinely robust to missing data, demonstrating that high sparsity in benchmark datasets often masks performance limitations. This research introduces novel datasets with dense, meaningful features and realistic missingness mechanisms, alongside a new baseline model, GNNmim, offering a more rigorous evaluation framework and a competitive solution for node classification with incomplete data.
Existing studies predominantly address relatively benign scenarios, namely benchmark datasets with high-dimensional but sparse node features and incomplete data generated under Missing Completely At Random (MCAR) mechanisms. The researchers theorise that high sparsity substantially limits the information loss caused by missingness, artificially inflating the robustness of all models and hindering meaningful performance comparisons. To address this limitation, they introduce one synthetic dataset alongside three real-world datasets, all characterised by dense, semantically meaningful features. Furthermore, the study moves beyond the constraints of MCAR by designing evaluation protocols incorporating more realistic missingness mechanisms.
Dense Feature Sets for Robust GNN Evaluation
Researchers addressed a critical challenge in deploying Graph Neural Networks (GNNs) , handling missing node features , by meticulously re-evaluating existing methodologies and pioneering new evaluation protocols. The study began by theoretically demonstrating that the high sparsity inherent in commonly used benchmark datasets obscures meaningful comparisons between models, as substantial information loss is minimal when features are already largely absent. To circumvent this, the team engineered one synthetic and three real-world datasets characterised by dense, semantically meaningful features, providing a more robust foundation for assessing model performance. The work moved beyond simplistic evaluations employing MCAR mechanisms, instead designing protocols incorporating more realistic missingness scenarios.
Scientists established a theoretical background to explicitly define assumptions regarding the missingness process and analyse their implications for different GNN architectures. This analytical framework informed the creation of evaluation regimes that included representative instances of both MCAR and Missing Not At Random (MNAR) mechanisms, mirroring real-world complexities where missingness correlates with feature values or prediction targets. Furthermore, the researchers accounted for potential discrepancies between training and test data distributions, simulating scenarios where missingness patterns shift over time due to factors like sensor failures. To provide a baseline for comparison, the study pioneered GNNmim, a novel GNN model leveraging the Missing Indicator Method.
This approach augments node feature matrices with binary masks denoting missing values, enabling standard GNN architectures to process incomplete data without requiring complex learned imputation strategies. Experiments demonstrated that GNNmim achieves competitive performance against specialised architectures across the newly constructed datasets and diverse missingness regimes. This methodological innovation, combining carefully curated datasets with realistic evaluation protocols and a streamlined model, offers a significant advancement in the reliable assessment of GNN robustness to incomplete feature data.
Sparsity Masks GNN Robustness Claims
Scientists have demonstrated that high sparsity in node features substantially limits information loss caused by missing data, leading to an overestimation of robustness in existing Graph Neural Network (GNN) models. Their work reveals that performance appears consistently high across models when tested on benchmark datasets with sparse features, effectively preventing meaningful comparison. To address this, the team introduced a novel synthetic dataset alongside three real-world datasets characterised by dense, semantically meaningful features, a crucial step towards more realistic evaluation. Experiments revealed that existing GNN-based methods maintain high performance only when less than 90% of data entries are removed, casting doubt on the validity of current benchmarks for assessing true model robustness.
The research meticulously details the construction of these new datasets, featuring naturally low-dimensional and interpretable raw features such as physical measurements. This focus on dataset quality aligns with recent calls for improved benchmark design within the graph machine learning community. The study highlights that mutual information between features and labels remains largely unaffected by missingness until extremely high removal rates are reached, exceeding 90% of entries. Further investigation focused on the limitations of current evaluation protocols, which predominantly employ MCAR mechanisms.
The team designed more realistic scenarios, incorporating both MCAR and MNAR mechanisms, where the probability of missingness is linked to unobserved feature values. They also introduced train-test distribution shifts, simulating real-world deployments where missing data patterns differ between training and testing phases. Measurements confirm that existing methods struggle to maintain performance under these more complex conditions, demonstrating a lack of robustness beyond simplified benchmarks. Building on this analysis, scientists proposed GNNmim, a simple yet effective baseline model for node classification with incomplete feature data.
GNNmim augments the node feature matrix with a binary mask indicating missing features, processing this representation with a standard GNN without requiring learned imputation. Tests prove that GNNmim achieves competitive performance across diverse datasets and missingness regimes, demonstrating its robustness and serving as a valuable baseline for future research. The breakthrough delivers a principled evaluation framework, establishing a foundation for more meaningful and reliable advancements in GNNs with missing features.
Realistic Missingness Reveals GNN Weaknesses
This research addresses a critical limitation in the evaluation of graph neural networks (GNNs), specifically their performance when faced with missing node features. The authors demonstrate that existing benchmark datasets, characterised by high-dimensional sparse features and missing data generated under ‘Missing Completely At Random’ conditions, often fail to provide a meaningful assessment of robustness. They establish that high feature sparsity can mask the impact of missingness, leading to overly optimistic performance evaluations. To overcome this, the study introduces a new suite of datasets, incorporating dense, semantically meaningful features, alongside evaluation protocols employing more realistic missingness mechanisms than previously used.
Through theoretical analysis and empirical results, the researchers highlight the sensitivity of many GNNs to the type of missingness, showing that performance on standard benchmarks does not reliably predict performance under more complex scenarios. They propose GNNmim, a baseline model that demonstrates competitive and consistently robust performance across diverse datasets and missingness regimes. The authors acknowledge that their analysis focuses on specific missingness mechanisms and datasets, and that further investigation is needed to explore the generalizability of their findings. Future work, they suggest, could extend to different GNN architectures and more complex real-world scenarios. Nevertheless, this work provides a valuable contribution by establishing a more rigorous framework for evaluating GNN robustness and demonstrating that achieving broad robustness to realistic missing data is possible, even with relatively simple models.
👉 More information
🗞 Rethinking GNNs and Missing Features: Challenges, Evaluation and a Robust Solution
🧠 ArXiv: https://arxiv.org/abs/2601.04855
