Plgc Achieves Label-Free Graph Condensation, Matching Data Statistics Without Ground-Truth Labels

The increasing size of datasets presents a significant challenge to training graph neural networks (GNNs), demanding substantial computational resources. Researchers Jay Nandy, Arnab Kumar Mondal, and Anuj Rathore, all from Fujitsu Research of India, alongside Mahesh Chandran, have addressed this issue with a novel approach called Pseudo-Labeled Graph Condensation (PLGC). This new framework generates smaller, synthetic graphs that accurately represent the original data, crucially doing so without relying on traditional, supervised labels. PLGC diagnoses the limitations of existing condensation methods when labels are unreliable and introduces a self-supervised technique that learns from node embeddings to create pseudo-labels. Through both theoretical guarantees and empirical results across node classification and link prediction, the team demonstrates PLGC’s competitive performance and remarkable robustness in noisy or weakly-labeled environments, offering a significant step forward for practical graph analysis.

Existing graph condensation methods aim to approximate original data, but typically depend on clean, supervised labels, limiting their effectiveness when labels are scarce, noisy, or inconsistent. PLGC then optimises condensed graphs to align with the structural and feature statistics of the original graph, achieving this without the need for ground-truth labels. These challenges are particularly acute when resources are limited or repeated training is necessary, for example during hyperparameter optimisation. Graph condensation offers a solution by creating a smaller, synthetic graph that mirrors the structure and features of the original, reducing both training and storage demands. PLGC constructs and refines latent pseudo-labels, effectively capturing the inherent structural and feature characteristics of the original graph. The method employs an alternating optimisation procedure with two key components designed to achieve this. This approach aims to provide a robust condensation technique applicable even with noisy or incomplete data. The PLGC framework focuses on preserving the latent structural statistics of the original graph and ensuring accurate embedding alignment.

By utilising pseudo-labels, the system can effectively learn from the graph’s intrinsic properties without dependence on external, potentially flawed, labels. Experiments across node classification and link prediction tasks demonstrate that PLGC achieves performance comparable to state-of-the-art supervised condensation methods when using clean datasets. Notably, PLGC exhibits significant robustness in the presence of label noise, frequently surpassing the performance of all baseline methods by a considerable margin. These findings underscore the practical and theoretical benefits of self-supervised graph condensation, particularly in environments characterised by noisy or weakly-labelled data. The research addresses a critical limitation of existing condensation methods, which typically require clean, supervised labels, a constraint that hinders their effectiveness when dealing with noisy or incomplete information. Experiments revealed that PLGC constructs latent pseudo-labels from node embeddings and optimises condensed graphs to mirror the original structural and feature statistics, achieving competitive performance with state-of-the-art supervised techniques on clean datasets. The team measured the stability of pseudo-label estimation and established conditions under which these labels remain reliable even with noise or distributional variation.

Theoretical guarantees demonstrate that PLGC preserves latent structural statistics, ensuring accurate alignment of node embeddings between original and condensed graphs. Results demonstrate significant robustness under label noise, with PLGC frequently outperforming all baseline methods by a substantial margin across node classification and link prediction tasks. This breakthrough delivers a method capable of functioning effectively in scenarios where supervised condensation methods falter due to data imperfections. Further investigation quantified the performance of PLGC across five benchmark datasets, showing it significantly outperforms supervised condensation in low-label, noisy-label, and shift-label regimes.

Tests prove that PLGC remains competitive with supervised methods when clean labels are available, while also offering superior generalization in multi-source and distributionally heterogeneous settings. The work establishes principled robustness under weak or unreliable supervision through theoretical bounds on embedding divergence between original and condensed graphs. This research highlights the potential of self-supervised condensation for practical applications requiring efficient and minimally supervised training of graph models. Scientists identified structural limitations within supervised graph condensation, analytically and empirically demonstrating a deterioration in effectiveness when faced with label noise, scarcity, or distribution shift. The PLGC framework employs alternating pseudo-label estimation and condensed-graph optimisation, completely eliminating the need for ground-truth labels. This innovative approach allows for the construction of compact synthetic graphs and associated pseudo-labels that preserve task-relevant latent structure, enabling efficient training of downstream graph models with limited supervision.

👉 More information
🗞 PLGC: Pseudo-Labeled Graph Condensation
🧠 ArXiv: https://arxiv.org/abs/2601.10358

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Sar-atr Achieves Robustness Against Adversarial Examples with Novel Spatial Deformation Technique

Sar-atr Achieves Robustness Against Adversarial Examples with Novel Spatial Deformation Technique

January 19, 2026
Llm Discovery Advances Vision Models with Non-Standard Channel Priors and Vast Data Generation

Llm Discovery Advances Vision Models with Non-Standard Channel Priors and Vast Data Generation

January 19, 2026
Detects 33.8% More Mislabeled Data with Adaptive Label Error Detection for Better Machine Learning

Detects 33.8% More Mislabeled Data with Adaptive Label Error Detection for Better Machine Learning

January 17, 2026