Hagd Achieves 91% Sparse Circuit Extraction from Billion-Parameter Language Models

January 23, 2026 by Rohail T.

Hagd Achieves 91% Sparse Circuit Extraction from Billion-Parameter Language Models

Scientists are addressing the challenge of mechanistic interpretability in billion-parameter language models (LLMs), aiming to understand how these models compute internally. Mohammed Mudassir Uddin, Shahnawaz Alam, and Mohammed Kaif Pasha, from Muffakham Jah College of Engineering and Technology, introduce Hierarchical Attribution Graph Decomposition (HAGD), a novel framework that efficiently extracts sparse computational circuits from LLMs. HAGD drastically reduces computational complexity compared to exhaustive search methods, enabling scalable analysis of models ranging from 117 million to 70 billion parameters.

Experiments show that HAGD achieves up to 91% behavioural preservation (±2.3%) on modular arithmetic tasks, while maintaining interpretable subgraph sizes. The team validated the circuits through causal intervention protocols, confirming that these subgraphs represent genuine computational components rather than correlational artifacts. Cross-architecture analyses revealed that extracted circuits share moderate structural similarity (≈67%) across model families such as GPT-2, Llama, and Pythia, suggesting the existence of shared computational patterns and potential universal principles of neural computation.

HAGD establishes necessity and sufficiency criteria for circuit verification against behavioural benchmarks, providing a robust methodology for interpreting large-scale LLMs. This work lays the foundation for future advances in AI interpretability, highlighting both the potential and current limitations of mechanistic approaches for understanding complex language models.

👉 More information
🗞 Hierarchical Sparse Circuit Extraction from Billion-Parameter Language Models through Scalable Attribution Graph Decomposition
🧠 ArXiv: https://arxiv.org/abs/2601.12879

Tags:

algorithmic tasks behavioural preservation causal intervention cross-layer transcoders graph network meta-learning HAGD framework Hierarchical Attribution Graph Decomposition large language models! monosemantic feature extraction multi-resolution abstraction

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Rohail T.

Latest Posts by Rohail T.:

Hidden Signals Break through ‘disco’ Jamming to Stay Undetected by Wardens

Ai’s ‘attention’ Problem Solved with Technique Offering Near-Linear Speed-Up

Self-Driving Cars Gain Sharper Vision with New Time-Aware AI System