Scientists are tackling the complex problem of understanding how Large Language Models (LLMs) process information internally, where distinguishing meaningful computational signals from background noise remains a major challenge. Jonathan Pan from the Home Team Science and Technology Agency Singapore, alongside colleagues, introduces the Quantum Sieve Tracer, a novel framework designed to pinpoint factual recall circuits within transformer-based language models. Using open-weight models such as Meta Llama-3.2-1B and Alibaba Qwen2.5-1.5B-Instruct, the researchers reveal a fundamental architectural difference between the two systems. In particular, they demonstrate that Layer 9 in Llama functions as an “Interference Suppression” circuit, a surprising result in which removing specific attention components actually enhances factual recall. This work provides a high-resolution method for analyzing attention mechanisms and represents a significant step toward truly interpretable artificial intelligence.

The authors implement a modular pipeline that first localizes critical layers using classical causal tracing and then maps specific attention-head activations into an exponentially large quantum Hilbert space. Through this two-stage analysis, they uncover a fundamental architectural divergence between the models. While Qwen’s Layer 7 operates as a conventional “Recall Hub,” Llama’s Layer 9 behaves as an “Interference Suppression” circuit, where ablation of the identified attention heads paradoxically improves recall performance. The transformer’s residual stream, which serves as the primary data pathway, often contains superpositions of multiple semantic tasks. To disentangle these, the study employs a strictly hybrid protocol that combines the scalability of classical causal tracing with the high-dimensional expressivity of quantum kernels, achieving diagnostic resolution unattainable by either method alone.

The paper describes the software architecture and experimental results of this approach as applied to two open-weight models using the PennyLane framework. Section II reviews foundational literature in mechanistic interpretability and quantum machine learning kernels. Section III details the proposed hybrid methodology, explaining how classical causal localization is integrated with the quantum feature sieve. Section IV outlines the experimental setup, including computational environment and model specifications. Section V presents quantitative results from layer localization and quantum interaction mapping. Section VI discusses the implications of these findings for understanding architectural inductive biases, and Section VII concludes with directions for future research in quantum-assisted interpretability.

The development of the Quantum Sieve Tracer draws on two primary research domains: classical mechanistic interpretability and quantum machine learning kernels. In mechanistic interpretability, prior work has yielded substantial insights into transformer behavior. The causal tracing method introduced by Meng et al. localized factual knowledge within specific MLP modules by corrupting and restoring hidden states, while the discovery of induction heads by Olsson et al. explained key aspects of in-context learning. However, these classical techniques often assume linear separability of semantic features or require computationally expensive activation-patching sweeps. The present work builds on these localization principles but replaces linear probes with nonlinear quantum kernels capable of detecting subtler geometric divergences in representation space.

Quantum machine learning kernel methods exploit the kernel trick by mapping data into high-dimensional Hilbert spaces where linear separation becomes feasible. Schuld and Killoran formalized this framework for supervised learning. While previous applications have focused on generative modeling or external classification tasks, this study applies quantum kernels diagnostically. Rather than classifying external inputs, the quantum feature map is used to measure the internal consistency and structural geometry of a neural network’s own representations.

The Quantum Sieve Tracer operates as a strictly hybrid computational pipeline following a “locate-then-analyze” principle. Classical causal tracing performs a coarse-grained search to identify the most critical layer for factual recall, after which the quantum kernel conducts fine-grained structural analysis. This process consists of four stages: Classical Causal Localization, Activation Extraction, Feature Sieving, and Quantum Kernel Estimation.

In the Classical Causal Localization stage, the authors compute a recovery score for each layer using the methodology of Meng et al. The recovery score quantifies the extent to which restoring activations from a clean run into a corrupted run recovers the probability of the correct token. The layer exhibiting the strongest restoration is selected as the knowledge hub for quantum analysis.

Once the critical layer is identified, activation extraction and contrastive generation isolate the factual recall circuit. Reference activations are generated using factual prompts, while noise activations are produced by replacing subject nouns with random alternatives. Output tensors from all attention heads in the selected layer are extracted using forward hooks in PyTorch.

Because direct encoding of raw activation vectors into quantum circuits is infeasible, a feature-sieving mechanism is applied. Logistic regression probes distinguish reference from noise activations for each attention head, and the top neurons with the highest absolute coefficients are selected. These features are normalized to a bounded range suitable for quantum rotation gates.

Quantum kernel estimation is performed by encoding the selected features into a multi-qubit quantum state using angle embedding implemented in PennyLane. Fidelity matrices are computed between attention heads to map the geometric topology of interactions within the critical layer. This reveals which heads share significant information overlap in the quantum feature space.

Experiments were conducted in a Python 3.10 environment using a single NVIDIA T4 GPU, with PyTorch, HuggingFace Transformers, PennyLane, and Scikit-learn. For Llama-3.2-1B, the interaction matrix revealed a sparse topology dominated by a small number of specialized heads. Ablation of these heads resulted in a negative probability drop, meaning the probability of the correct token increased, confirming the presence of an interference suppression mechanism. In contrast, Qwen2.5-1.5B exhibited a dense interaction topology, and ablation of Layer 7 heads caused a positive performance drop, consistent with a classical recall hub.

Statistical validation using Student’s t-tests confirmed that the observed effects were non-random. Additionally, the near-zero Spearman rank correlation between classical causal traces and quantum fidelity vectors demonstrated that the Quantum Sieve captures information fundamentally distinct from classical linear probes. Together, these results show that integrating classical causal tracing with quantum kernels provides a robust, multi-scale framework for understanding the internal mechanisms of LLMs.

Layer-specific mechanisms governing factual recall in large language models

Using the Meta Llama-3.2-1B and Alibaba Qwen2.5-1.5B-Instruct models, researchers implemented a two-stage analysis to characterise factual recall circuits. The study revealed architectural divergence between the two models, specifically in how they process factual information. In the Qwen2.5-1.5B-Instruct model, layer 7 functions as a classic Recall Hub, demonstrating a constructive mechanism for factual recall.

Conversely, layer 9 in the Llama-3.2-1B model operates as an Interference Suppression circuit, exhibiting a reductive mechanism. Ablating identified heads within the Llama model’s layer 9 paradoxically improved factual recall performance. This suggests that these heads actively suppress interfering information rather than directly contributing to recall.

The research employed a Recovery Score metric to pinpoint critical layers for factual recall, calculating the restoration of target token probability logits. The highest restoration was observed in Qwen’s layer 7 and Llama’s layer 9, confirming their roles as Knowledge Hubs for high-resolution analysis.

Classical Causal Tracing was used to initially localise these critical layers before applying quantum kernels for detailed structural analysis. Activation vectors were extracted from all Attention Heads at the identified layer, generating Reference and Noise sets through factual query prompting and subject noun replacement.

A feature selection mechanism, termed “The Sieve”, was implemented using Logistic Regression to reduce dimensionality. This involved training a classifier to distinguish between Reference and Noise activations, selecting the top five neurons with the highest coefficients, and normalising the resulting vectors.

The Logistic Regression probe training achieved a coefficient magnitude threshold of 0.1 for feature selection. This dimensionality reduction process facilitated the encoding of activation data into quantum rotation gates for subsequent quantum kernel estimation. This approach combines classical causal tracing to pinpoint relevant layers with a quantum kernel mapping attention head activations into a high-dimensional Hilbert space. Analysis of open-weight models, Llama-3.2 and Qwen2.5, revealed fundamental architectural differences in how these models handle factual recall.

Specifically, the research identified that layer 7 in Qwen2.5 functions as a classic recall hub, while layer 9 in Llama-3.2 operates as an interference suppression circuit, paradoxically improving recall when ablated. The developed kernels successfully distinguished between these constructive recall mechanisms and reductive suppression mechanisms, providing a detailed tool for analysing attention topology.

A near-zero correlation between classical and quantum traces confirms the quantum kernel detects geometric features not captured by linear probes. The authors acknowledge limitations related to the current scale of analysis and the need for validation on near-term quantum hardware. Future work will explore extending this “Quantum Worldline” concept to dynamic reasoning processes, testing the robustness of the angle embedding strategy on physical devices, and investigating “Quantum Steering” for precise model behaviour editing. These findings establish a viable protocol for high-resolution mechanistic interpretability and offer a path toward understanding the fine-grained workings of large language models.

👉 More information
🗞 The Quantum Sieve Tracer: A Hybrid Framework for Layer-Wise Activation Tracing in Large Language Models
🧠 ArXiv: https://arxiv.org/abs/2602.06852

Tags:

attention head activations causal tracing factual recall circuits Hilbert Space. interference suppression Large Language Models Llama-3.2-1B Mechanistic interpretability Qwen2.5-1.5B-Instruct Recall Hub

AI ‘brain’ Mapping Reveals How Language Models Store and Recall Facts

Layer-specific mechanisms governing factual recall in large language models

Rohail T.

Latest Posts by Rohail T.:

Quantum Networks Overcome Fragility to Synchronise Learning across Distances

Interactions Weaken Precision of Electrical Current in Novel Hybrid Materials

Unhackable Random Number Generator Sidesteps Device Flaws for Ultimate Security