Polysemanticity Limits Understanding of Neural Networks in High Dimensions

The hope of understanding artificial intelligence by dissecting individual neurons proved surprisingly short-lived, a shift marked by a growing recognition of neuron polysemanticity. Researchers initially envisioned identifying specific functions within neural networks, such as a neuron dedicated to recognizing “cats,” or even one responding to the idea of “betraying all humans,” but quickly encountered a more complex reality. This phenomenon, where a single neuron fires in response to seemingly unrelated stimuli, presents a key obstacle to interpreting these systems; for example, researchers observed the “betray all humans” neuron firing during discussions of cats. “Perhaps the AI was planning a robotic uprising, or perhaps it was simply considering the genealogy of Maine coons,” noted LawrenceC, who discussed these challenges at a recent InkHaven event hosted by Georgia Ray, framing the abstract research within a relatable context.

Neuron Polysemanticity & Early Representation Research

This approach quickly encountered a significant obstacle: neuron polysemanticity, the phenomenon where a single neuron responds to a surprisingly diverse range of stimuli. This unexpected behavior complicates efforts to assign fixed meanings to individual neurons. As LawrenceC explained, recounting a discussion at an InkHaven event hosted by Georgia Ray, the “betray all humans” neuron was observed firing not only during discussions of malicious intent, but also during conversations about cats. This observation challenged the initial assumption of a one-to-one correspondence between neurons and concepts, prompting a shift in research focus toward understanding how neural networks represent information in a distributed manner. Early theories posited that high-dimensional spaces allowed for efficient representation of numerous concepts using near-orthogonal vectors, even random ones. The Johnson-Lindenstrauss lemma, which suggests exponentially increased representational capacity, became a key point of reference.

This led to research projects beginning after 2022 exploring “representational superposition,” attempting to extract concepts from large language models by understanding how small networks could represent multiple concepts simultaneously. However, a counterargument emerged, emphasizing that neural networks are not merely representing concepts, but actively computing them. LawrenceC clarified this point, stating, “Generally, neural networks do not simply represent concepts that are given as input.” Recent work, including research conducted in 2024, has explored the potential for achieving efficiency through computational superposition, though gains have proven to be approximately quadratic rather than exponential. Adler and Shavit’s “On the Complexity of Neural Computation in Superposition” provides lower bounds on network size, demonstrating that for some classes of problems, a network requires at least the square root of m divided by the logarithm of m neurons to compute with m concepts, a result based on information theory and rigorous mathematical arguments. LawrenceC described the paper as demonstrating “what careful computer scientists are like,” praising their approach and mathematical rigor.

Johnson-Lindenstrauss Lemma and Representational Superposition

The pursuit of understanding how neural networks function has undergone a significant recalibration in recent years, shifting from dissecting individual neuron roles to grappling with the complexities of distributed representation. Initial optimism surrounding the ability to pinpoint specific functions, such as a neuron for “cats” or a neuron for “betraying all humans,” quickly gave way to the realization that networks operate with a far more nuanced and interwoven architecture. This shift in perspective has been heavily influenced by mathematical concepts, notably the Johnson-Lindenstrauss lemma. The lemma, which suggests exponentially increased representation capacity in high-dimensional spaces, provided a theoretical framework for understanding how networks might efficiently encode information, positing that it’s possible to represent data points in a lower-dimensional space while preserving their relative distances, albeit with some degree of noise.

This idea fueled a wave of research projects after 2022 studying what is now known as representational superposition, where scientists explored how networks could represent multiple concepts simultaneously using near-orthogonal vectors. However, the focus broadened beyond mere representation, as researchers began to consider that neural networks aren’t simply storing concepts, but actively computing with them. “Their argument for this is arguably obvious, starting from an information theory based counting argument—you can’t represent enough things if you don’t have enough parameters,” LawrenceC noted, highlighting the careful mathematical approach employed by Adler and Shavit. The researchers also propose a construction for weight matrices, envisioning them composed of decompression and compression/computation components, potentially offering a new avenue for designing efficient neural networks.

you can’t just think of a neural network as representing a bunch of things.

Adler & Shavit’s sqrt(m) Network Complexity Bounds

Researchers are increasingly focused on the fundamental limits of neural network efficiency, and a recent paper by Adler and Shavit is drawing attention for its rigorous approach to quantifying computational complexity. LawrenceC, who discussed the paper at InkHaven, noted that his initial impression was “Wow, this is what real computer scientists are like,” highlighting the mathematical precision of Adler and Shavit’s work. The shift towards analyzing network complexity stems from an earlier, ultimately unsuccessful, attempt to understand artificial intelligence by dissecting individual neurons. While previous research, including work done by LawrenceC in 2024, focused on upper bounds, showing how few resources might be sufficient, Adler and Shavit demonstrate how much is fundamentally necessary. They also refined upper bound results, demonstrating that O(sqrt(m) log(m)) neurons are sufficient, and provided lower bounds showing that a network requires at least the square root of m divided by the logarithm of m neurons.

This effectively tightens the bounds and shows the sqrt(m) result is accurate when combining both their upper and lower bound results. “They imagine this happening inside of a single weight matrix, instead of being spread between weight matrices of different layers,” LawrenceC noted, suggesting a potentially useful architecture for hand-constructing networks for similar problems. The paper contains a citation to “personal communication with another MIT professor.” LawrenceC concluded, “This was a cool paper,” praising the authors’ careful proofs and rigorous approach to a challenging problem.

you need a network that has at least sqrt(m/log(m)) number of neurons.

Adler and Shavit

Decompressed/Compressed Weight Matrix Construction for Computation

The pursuit of efficient artificial intelligence has led researchers to increasingly sophisticated methods of organizing computational resources within neural networks, with recent work focusing on how weight matrices can be constructed to maximize performance. This isn’t simply about shrinking models; it’s about fundamentally altering how information is represented and processed, potentially unlocking new levels of computational power. Initial attempts to understand neural networks by dissecting individual neurons proved overly simplistic, quickly revealing the phenomenon of neuron polysemanticity. However, a shift toward understanding how networks represent information in high-dimensional spaces offered a new avenue for investigation. The Johnson-Lindenstrauss lemma, suggesting exponentially increased representation capacity with dimensionality, fueled early optimism. Building on this foundation, Adler and Shavit’s paper builds on this initial work.

And while reading it, my main impression was something along the lines of “Wow, this is what real computer scientists are like.” I do have some complaints about how they wrote the paper and presented their results, but one thing that stood out to me is that the paper makes it very clear that they really do know a lot of math. They’re really careful with their math and constructions in a way that I think the work I was involved in was just not. A lot of what we did in this area felt like gesturing at proof sketches that should probably work out. They also cite some theoretical computer science results that I didn’t know about, that seem quite relevant.

They propose a novel approach to weight matrix construction, envisioning each matrix as composed of two distinct parts. “They envision every single weight matrix as being composed internally of two parts: first, a big decompression matrix that takes the small, dense representation and expands it into a large sparse representation. Second, a large computation and compression matrix, which both does the computation on the sparse representation and also compresses it back into a single dense representation,” explains LawrenceC, who presented an overview of the work. Their argument for this is arguably “obvious” in the sense that it starts from an information theory based counting argument, you can’t represent enough things if you don’t have enough parameters. However, it turns out that making this argument rigorous in the presence of noise becomes complicated.

The researchers demonstrated that, for certain problems, the number of neurons required for computation is fundamentally limited, achieving a tight bound of approximately the square root of the number of concepts being processed, up to logarithmic factors when combining their upper and lower bound results. The paper’s strength, according to LawrenceC, lies in its mathematical rigor. “This was a cool paper. It really does show that theoretical computer science people have a lot of expertise in doing proofs carefully and doing the work to make their results go through.” While acknowledging the work built upon existing concepts, the detailed mathematical framework and lower-bound proofs offered a significant contribution to the field, solidifying the understanding of computational limits within neural networks.

It’s worth noting that, Dmitry Vaintrob did go through and prove all of our results rigorously – he’s a real mathematician!

The Neuron

The Neuron

With a keen intuition for emerging technologies, The Neuron brings over 5 years of deep expertise to the AI conversation. Coming from roots in software engineering, they've witnessed firsthand the transformation from traditional computing paradigms to today's ML-powered landscape. Their hands-on experience implementing neural networks and deep learning systems for Fortune 500 companies has provided unique insights that few tech writers possess. From developing recommendation engines that drive billions in revenue to optimizing computer vision systems for manufacturing giants, The Neuron doesn't just write about machine learning—they've shaped its real-world applications across industries. Having built real systems that are used across the globe by millions of users, that deep technological bases helps me write about the technologies of the future and current. Whether that is AI or Quantum Computing.

Latest Posts by The Neuron: