Multiverse Computing Cuts LLM Perplexity 1.4% on 156-Qubit System

Multiverse Computing has demonstrated a 1.4 percent reduction in perplexity, a measure of how well a language model predicts a sample, by interfacing the widely used Llama 3.1 8B model with a 156-qubit IBM Quantum System Two superconducting processor. This achievement marks a shift from theoretical explorations of quantum AI toward measurable improvements on real quantum hardware, suggesting a new path to enhance AI models beyond the limitations of classical computing. Researchers employed Cayley-parameterised unitary adapters, quantum circuit blocks inserted into the language model, requiring only 6,000 additional parameters for the performance gain.

LLM Parameter Scaling & Classical Limitations

The growth of large language models (LLMs) is encountering a fundamental physical constraint; every trainable parameter occupies classical memory, and scaling a deployed model’s parameter count requires an expansion of compute infrastructure that is unsustainable. While techniques like quantisation and pruning offer some relief, they trade expressive capacity for reduced size, prompting exploration into alternative computational paradigms. Quantum computing presents a potentially transformative solution, leveraging the exponentially larger state space of qubits to circumvent classical memory bottlenecks. Recent work by Multiverse Computing demonstrates a tangible step beyond theoretical proposals, achieving a 1.4 percent improvement in perplexity on the widely-used Llama 3.1 8B model. This approach avoids the need for exponentially scaling classical resources, as the trained adapter.

A systematic study using the smaller SmolLM2 model (135 million parameters) revealed that perplexity improves consistently with increasing unitary block dimension, and demonstrated 83 percent recovery of compression-induced degradation. Researchers found that the system could even provide correct answers to questions that classical baselines failed to address, identifying a sharp noise, expressivity phase transition indicative of a pathway toward practical quantum utility at larger qubit scales. This suggests that even modest increases in quantum hardware capabilities could unlock significant gains in LLM performance.

Cayley Unitary Adapters for LLM Integration

The pursuit of more efficient large language models has led researchers to explore architectures beyond classical computing, with recent attention focused on integrating quantum processing units. Multiverse Computing has demonstrated that quantum enhancement isn’t limited to theoretical proposals but can be realised with measurable improvements on existing hardware. Their work centers on Cayley-parameterised unitary adapters, a specific quantum technique designed to interface with pre-trained LLMs. Unlike previous methods, this strategy inserts quantum circuit blocks into the frozen projection layers of models like Llama 3.1 8B. This configuration resulted in a 1.4 percent improvement in perplexity, a key metric for evaluating language model predictability, with only 6,000 additional parameters. Researchers explain that a systematic study using SmolLM2, a 135-million-parameter model, further illuminated the benefits of this approach. By varying the unitary block dimension, the team observed consistently improving perplexity and an 83 percent recovery of compression-induced degradation.

SmolLM2 Perplexity Improvement with Unitary Blocks

Multiverse Computing researchers, led by Borja Aizpurua, are examining the interplay between quantum processing and language model performance, moving beyond theoretical proposals with a focus on practical implementation. Their work centers on SmolLM2, a 135-million parameter language model, chosen for its manageable size allowing for exhaustive experimentation, a crucial step towards understanding how quantum circuits can genuinely enhance artificial intelligence. This systematic study revealed consistently improving perplexity with unitary block dimension, indicating that increasing the complexity of the quantum component consistently refined the model’s predictive capabilities. These adapters, though adding only 6,000 parameters, demonstrably improved the perplexity of Llama 3.1 8B. This improvement isn’t merely statistical noise; the researchers observed 83 percent recovery of compression-induced degradation, suggesting the quantum adapters can mitigate performance loss from model compression techniques.

Notably, SmolLM2 not only showed improved perplexity but also provided correct answers to questions that classical baselines failed to answer, highlighting a potential for quantum circuits to unlock capabilities beyond those achievable with classical architectures. The team views this work as analogous to the experimental realisation of Shor’s algorithm, establishing a foundation for scaling quantum-enhanced language models.

Noise-Expressivity Phase Transition & Quantum Utility

The successful demonstration of quantum enhancement to a large language model isn’t simply about achieving a marginal improvement; it reveals a critical interplay between quantum noise and a model’s ability to learn, a phenomenon researchers are calling a noise, expressivity phase transition. Multiverse Computing’s work with SmolLM2, and a 1.4 percent improvement in perplexity with Llama 3.1 8B, is a small gain that belies a significant shift in how AI models might be optimized. This isn’t merely about squeezing more performance from existing architectures, but about unlocking a fundamentally different path forward. The team’s systematic study allowed them to pinpoint a “sharp noise–expressivity phase transition,” identifying the point at which the benefits of quantum computation outweigh the detrimental effects of hardware noise. This transition is crucial because it establishes a concrete pathway toward quantum utility at larger qubit scales.

The researchers employed Cayley-parameterised unitary adapters, which are structurally entangling but factorize into independent blocks, allowing for shallow circuits executable within current coherence limits. As Borja Aizpurua and colleagues explain, the construction provides a practical and scalable route to quantum enhancement of contemporary LLMs.

Quantum Computing Resource Paradigm Shift

The prevailing narrative around quantum computing often focuses on its theoretical potential, but recent work from Multiverse Computing demonstrates a shift toward tangible results, moving beyond simulations and into demonstrable improvements on real hardware. A 1.4 percent improvement in perplexity when applied to the Llama 3.1 8B large language model is a modest figure that signals a new approach to enhancing artificial intelligence. This isn’t simply about achieving higher scores; it’s about establishing a pathway for quantum computers to contribute meaningfully to AI tasks, circumventing the limitations of classical memory scaling. These adapters, containing only 6,000 additional parameters, are trained classically while the core model remains frozen, offering a resource-efficient method for quantum integration. The design prioritises hardware efficiency; a 4×4 block requires only six free parameters to implement a full orthogonal rotation. As Borja Aizpurua and colleagues explain, this construction sidesteps the exponential scaling issues of generic unitary transformations.

Prior Quantum Approaches to Natural Language Processing

Initial forays into merging quantum computing with large language models largely remained theoretical until recently, with most proposals constrained by the limitations of simulation. Prior work explored quantum machine learning approaches to LLMs in restricted regimes, including classification and quantum natural language processing, but these lacked the scale of contemporary models. Researchers also investigated variational sequence models and hybrid methods for LLM fine-tuning, alongside quantum self-attention demonstrated on a 72-qubit processor for text classification. However, these efforts were often limited to simulators or focused on tasks other than autoregressive generation, a core function of modern LLMs. The current research builds upon, and distinguishes itself from, these earlier attempts by focusing on a production-scale, pre-trained LLM for autoregressive language generation executed on actual gate-based quantum hardware. This work introduces a novel strategy using block-diagonal unitaries, a construction designed to be hardware-efficient, sidestepping the exponential depth increases typical of generic unitary synthesis. The authors highlight that their approach provides a claim supported by their demonstration of a 1.4 percent perplexity improvement with Llama 3.1 8B.

Hardware-Efficient Block-Diagonal Unitary Construction

Rather than attempting to fully quantumize LLMs, researchers focused on strategically inserting “Cayley-parameterised block-diagonal unitary adapters” into existing classical models, a technique designed to minimize the demands on current quantum hardware. This construction allows for scalable quantum enhancement, sidestepping the exponential resource requirements of generic unitary transformations. The core innovation lies in creating unitary blocks that can be executed in parallel as shallow circuits, a critical factor given the limitations of qubit coherence. A key feature of this ansatz is its hardware efficiency; a generic d x d unitary would require exponentially growing synthesis depth, but the BDU construction factorises into independent blocks. For experiments in this work, the team fixed the block size to 4×4, resulting in a depth-19 native-gate circuit on the ibm_basquecountry processor. This approach enabled a 1.4 percent perplexity improvement of Llama 3.1 8B, achieved with only 6,000 additional parameters.

Llama 3.1 8B Enhancement on IBM Quantum System Two

Borja Aizpurua and colleagues at Multiverse Computing have demonstrated a functional quantum-enhanced large language model, moving beyond theoretical proposals with a tangible implementation on a substantial quantum processor. The team successfully integrated quantum circuit blocks into the Llama 3.1 8B. This achievement represents a critical step toward realizing the potential of quantum computing to address the escalating resource demands of artificial intelligence. The core innovation lies in the use of Cayley-parameterised unitary adapters, which are block-diagonal unitaries inserted into the model’s projection layers. These adapters contain only 6,000 additional parameters yet yielded a 1.4 percent improvement in WikiText perplexity, a measure of how well the model predicts text. The researchers state that this systematic study allowed for a detailed analysis of hardware noise and entanglement, providing mechanistic insights into the observed performance gains and paving the way for scaling these techniques to even larger models and qubit counts. The researchers believe this work establishes a pathway for quantum utility at larger qubit scales, offering a new approach to enhance LLM performance beyond classical limitations.

Stay current. See today’s quantum computing news on Quantum Zeitgeist for the latest breakthroughs in qubits, hardware, algorithms, and industry deals.
The Neuron

The Neuron

With a keen intuition for emerging technologies, The Neuron brings over 5 years of deep expertise to the AI conversation. Coming from roots in software engineering, they've witnessed firsthand the transformation from traditional computing paradigms to today's ML-powered landscape. Their hands-on experience implementing neural networks and deep learning systems for Fortune 500 companies has provided unique insights that few tech writers possess. From developing recommendation engines that drive billions in revenue to optimizing computer vision systems for manufacturing giants, The Neuron doesn't just write about machine learning—they've shaped its real-world applications across industries. Having built real systems that are used across the globe by millions of users, that deep technological bases helps me write about the technologies of the future and current. Whether that is AI or Quantum Computing.

Latest Posts by The Neuron: