The capacity of artificial neural networks to perform complex reasoning tasks despite readily memorising flawed training data presents a significant puzzle in the field of artificial intelligence. Researchers now investigate how these networks balance memorisation with generalisable reasoning, revealing a surprising reliance of memorisation on underlying reasoning mechanisms rather than operating as a separate process. Yupei Du from Utrecht University, Philipp Mondorf, Silvia Casola, Yuekun Yao, Robert Litschko and Barbara Plank from LMU Munich detail their findings in the article, “Reason to Rote: Rethinking Memorization in Reasoning”, where they demonstrate that memorisation of noisy labels occurs through distributed encoding and outlier heuristics, effectively building upon existing neural pathways rather than overriding them. Their work, utilising synthetic datasets designed for four-digit addition and relational reasoning, suggests a more nuanced understanding of how these networks achieve robust performance.

Large language models demonstrate a duality, combining reasoning skills with substantial memorisation capabilities. These models successfully tackle tasks requiring logical thought, such as arithmetic and relational reasoning, yet simultaneously retain and reproduce verbatim segments of their training data, including coherent text and potentially inaccurate information. The coexistence of these capabilities presents a scientific challenge, particularly when models confidently output incorrect information, prompting investigation into the underlying mechanisms.

The ability of deep neural networks to memorise noisy or inaccurate training data while maintaining strong generalisation performance is well established, suggesting memorisation isn’t simply a byproduct but potentially a necessary component for robust performance on real-world tasks. This work addresses a gap in understanding how models reconcile memorising incorrect information with their ability to generalise and solve problems correctly, by investigating the mechanisms behind memorisation within the context of reasoning tasks. Researchers focus on four-digit addition and two-hop relational reasoning, where solutions are unambiguous and easily verifiable, introducing controlled noise into the training data to observe how models learn to memorise inaccuracies while maintaining generalisation.

The core question driving this research is whether memorisation and reasoning are distinct or intertwined, specifically investigating if memorising incorrect labels relies on the same mechanisms used for generalisable reasoning, or if it represents a separate process. Understanding this interaction is crucial for developing more robust and reliable artificial intelligence systems, capable of both learning from data and applying knowledge effectively.

Recent research explores how models simultaneously reason and retain spurious information, and crucially, why this memorisation doesn’t necessarily degrade overall performance, utilising experiments with synthetic datasets specifically designed to test reasoning abilities with intentionally introduced noise. The datasets, focusing on four-digit addition and two-hop relational reasoning, allow precise control over the introduction of noisy labels and observation of the model’s response.

A key finding centres on the continued engagement of reasoning mechanisms even when the model retrieves a memorised, incorrect label, demonstrating that the model doesn’t simply recall the noisy data but continues to compute intermediate reasoning steps. Disrupting these intermediate steps demonstrably impairs memorisation, indicating a strong connection between reasoning and memorisation, suggesting that memorisation isn’t a separate process occurring in isolation but is integrated with, and dependent upon, the model’s reasoning capabilities. The research further demonstrates that memorisation operates through a distributed encoding scheme, meaning information isn’t stored in a simple look-up table fashion but the model aggregates information from various inputs and intermediate calculations to encode the noisy label, distributing the representation across numerous neurons.

Researchers identified ‘outlier heuristics’ as a crucial component of this memorisation process, describing subtle shifts in existing neuron activation patterns allowing the model to accommodate the noisy labels without fundamentally altering its core functionality. Essentially, the model doesn’t create new pathways for memorisation but slightly adjusts existing ones, leveraging pre-existing representations to encode the incorrect information, achieved by identifying neurons that exhibit unusual activation patterns. Techniques like activation patching, where the activations of these outlier neurons are replaced with those from a pre-memorisation state or a different data point, significantly reduce the model’s ability to recall the incorrect labels, highlighting the importance of understanding the internal mechanisms of large language models. The application of Iterative Null-Space Projection (INLP), a method for removing linearly encoded information from the model’s hidden representations, proves effective in mitigating the influence of these outlier neurons and improving the model’s generalisation ability.

Neural networks exhibit a curious ability to both generalise reasoning skills and readily memorise arbitrary training instances, including noisy labels, despite the potential for interference. Through experimentation utilising synthetic reasoning datasets—four-digit addition and two-hop relational reasoning—researchers demonstrate a reliance of memorisation on existing generalisable reasoning mechanisms, specifically showing that models continue to compute intermediate reasoning outputs even when retrieving memorised, and potentially incorrect, labels. Intervening in this reasoning process demonstrably impairs memorisation, revealing that memorisation does not operate as a simple look-up mechanism directly mapping inputs to noisy labels. Instead, it functions through distributed encoding, aggregating information from various inputs and intermediate results.

Analysis of the four-digit addition dataset reveals that memorisation frequently occurs via the adoption of ‘outlier heuristics’, where existing neuron activation patterns subtly shift to accommodate noisy labels, indicating the network doesn’t simply overwrite existing knowledge but adapts it to fit the erroneous data. This adaptation occurs through the influence of specific neurons, whose activation patterns become highly influential during the memorisation phase, demonstrated by ‘activation patching’—highlighting activations from the pre-memorisation model—effectively revealing the impact of these outlier neurons on the model’s predictions. Furthermore, the application of Iterative Null-space Projection (INLP), a technique for removing information from neural networks, successfully removes the memorised information, restoring the model’s original generalisation ability, suggesting that the memorised noise is a distinct component of the network’s state, separable from the underlying reasoning mechanisms.

These findings propose that memorisation of label noise builds upon, rather than overrides, the underlying reasoning mechanisms, explaining the observed ‘benign memorisation’ phenomenon, where networks can effectively learn from noisy data without significant degradation in generalisation performance. The research highlights the robustness of neural networks, demonstrating their ability to integrate imperfect information into a cohesive representational framework, prompting future work to explore the extent to which these outlier heuristics generalise across different network architectures and datasets. Investigating the specific properties of neurons involved in these heuristics—such as their connectivity and activation patterns—could provide further insights into the mechanisms of benign memorisation, while developing methods to explicitly identify and mitigate the influence of outlier heuristics could lead to more robust and reliable machine learning models.

Expanding the scope of investigation to include more complex reasoning tasks and real-world datasets is also crucial, helping to determine the limitations of the observed phenomena and assess its applicability to a wider range of machine learning applications. Finally, exploring the interplay between memorisation and generalisation in the context of continual learning—where models are exposed to a stream of data—could reveal new strategies for building adaptive and resilient artificial intelligence systems.

👉 More information
🗞 Reason to Rote: Rethinking Memorization in Reasoning
🧠 DOI: https://doi.org/10.48550/arXiv.2507.04782

Tags:

benign memorization distributed encoding four-digit addition generalizable reasoning. intermediate reasoning label noise outlier heuristics reasoning mechanisms synthetic datasets two-hop relational reasoning

Quantum News

AI Learns Despite Noise, Excels at Reasoning Tasks Found

Latest Posts by Quantum News:

University of Toronto Centre Awards Bell Prize for Neutral Atom Research

Tessara Therapeutics Leads Consortium to Develop Quantum Brain-on-Chip Platform

Thales Validates Post-Quantum Cryptography on Live Networks, Enabling Ongoing Protection