Large Language Models Achieve 90% Success in Autonomous Quantum Simulation

The complex simulation of quantum systems presents a significant challenge, traditionally demanding extensive expertise in techniques like tensor networks. Weitang Li, Jiajun Ren from Beijing Normal University, and Lixue Cheng from The Hong Kong University of Science and Technology, alongside Cunxi Gong et al., have now demonstrated a novel approach utilising large language model (LLM) agents to autonomously conduct these simulations. Their research reveals these AI agents can successfully perform tensor network simulations with approximately 90% accuracy across a range of benchmark tasks, including those modelling phase transitions and photochemical reactions. This breakthrough is significant because it bypasses the need for years of specialist training, allowing rapid deployment of AI in specialised computational domains and potentially accelerating discoveries in quantum physics and chemistry. The team’s systematic evaluation, employing models such as DeepSeek-V3.2 and Gemini 2.5 Pro, highlights the importance of both in-context learning and a multi-agent architecture for robust and reliable results.

These simulations achieved approximately a 90% success rate across a range of representative benchmark tasks, highlighting the potential of artificial intelligence in this complex field. Tensor network methods represent powerful tools for quantum simulation, yet their effective implementation traditionally demands considerable expertise gained through extensive graduate-level training. This work overcomes this barrier by integrating in-context learning with carefully curated documentation and a multi-agent decomposition strategy. The approach facilitates the training of autonomous AI agents within specialised computational domains, effectively automating a process previously reliant on highly skilled researchers.

Benchmarking LLM Configurations for Scientific Tasks

Large language models are increasingly being explored as tools for scientific research, demonstrating capabilities across diverse tasks. Researchers have been augmenting these models with external tools and memory to create autonomous agents capable of complex problem-solving. Recent advancements, such as ChemCrow, Coscientist, and El Agente Q, showcase successes in areas like chemical research, synthesis, and quantum chemistry, highlighting the potential of LLM-driven agents in scientific discovery. These agents build upon decades of work towards self-driving laboratories and represent a significant step towards automating scientific processes.

This study benchmarks three configurations of large language models, a baseline, a single-agent with in-context learning, and a multi-agent system also utilising in-context learning, across problems in quantum phase transitions, open quantum system dynamics, and photochemical reactions. The experiments were conducted over a period of minutes, systematically evaluating the performance of DeepSeek-V3.2, Gemini 2. Pro, and Claude Opus 4.5.

The methodology focuses on assessing the impact of both in-context learning and a multi-agent architecture on the accuracy and efficiency of solving these complex scientific problems. Analysis of the results reveals that both in-context learning and the multi-agent architecture are crucial for successful performance. Researchers identified characteristic failure patterns across the different models tested, providing insights into the limitations of each approach. Notably, the multi-agent configuration significantly reduced implementation errors and instances of hallucination compared to the simpler baseline and single-agent setups.

Tensor network methods, a powerful paradigm in computational science, are becoming increasingly complex with over 30 actively maintained software packages. Effective use of these methods requires expertise in network structures, bond dimension control, symmetry implementations, and observable computation. This research suggests a pathway for applying large language models to automate tensor network simulations, potentially streamlining workflows and increasing accessibility to these powerful computational tools.

LLM Agents Autonomously Perform Tensor Network Simulations

Scientists have achieved a breakthrough in autonomous scientific computing, demonstrating that large language model (LLM) agents can independently perform tensor network simulations with approximately a 90% success rate across a range of benchmark tasks. This work overcomes a significant hurdle in computational science, traditionally requiring years of graduate-level expertise to effectively utilise tensor network methods. The team developed AI agents capable of mastering specialised computational domains within minutes, combining in-context learning with carefully curated documentation and a novel multi-agent decomposition strategy. Experiments reveal a substantial improvement in both accuracy and efficiency when compared to single-agent approaches.

The research focused on addressing three key difficulties inherent in coupling physics, code, and numerical data. Tensor network methods are sparsely represented in standard LLM training data, leading to frequent hallucinations when models attempt simulations without sufficient knowledge. Furthermore, simulations generate dense numerical outputs demanding precise quantitative analysis, a task LLMs struggle with through direct reasoning alone. Finally, the interconnected nature of the process makes it challenging for a single agent to independently validate results, as errors can originate from physics, code, or data.

To counter these challenges, the scientists embedded approximately 43,000 tokens of curated Renormalizer documentation, comprising Jupyter notebook tutorials (22,000 tokens), Python script examples (12,000 tokens), and refactored source code snippets (9,000 tokens), directly into the system prompt. Systematic evaluation using DeepSeek-V3.2, Gemini 2.5 Pro, and Claude Opus 4.5 confirmed the critical importance of in-context learning and the benefits of a multi-agent architecture.

The team designed a system comprising a central Conductor coordinating seven specialised agents, each dedicated to a specific subtask, from research planning to data visualisation. This ‘context quarantine’ approach isolates reasoning modes, preventing interference and improving focus. Benchmarking on quantum phase transitions in the two-dimensional Ising model, spin dynamics in the sub-Ohmic spin-boson model, and retinal photoisomerization demonstrated the effectiveness of this architecture. Results demonstrate that the multi-agent configuration substantially reduces implementation errors and hallucinations compared to simpler architectures.

The agents autonomously generated publication-quality figures without manual modification or post-processing, showcasing the system’s ability to deliver complete scientific workflows. The baseline single agent, lacking documentation, struggled to retrieve accurate information from the Renormalizer source code, while the single-agent configuration with documentation showed improvement, but was surpassed by the multi-agent system’s focused approach. This breakthrough paves the way for automated scientific discovery and accelerates research in complex physical systems.

👉 More information
🗞 Autonomous Quantum Simulation through Large Language Model Agents
🧠 ArXiv: https://arxiv.org/abs/2601.10194

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

% Accuracy Achieved in Roadside Infrastructure Perception Using Vision-Language Models

Accuracy Achieved in Roadside Infrastructure Perception Using Vision-Language Models

January 19, 2026
Voicesculptor Achieves State-of-the-Art Open-Source Voice Design with Instruction-Following Control

Voicesculptor Achieves State-of-the-Art Open-Source Voice Design with Instruction-Following Control

January 19, 2026
Realising Continuous Functions Achieves Hydrodynamic Limit with Weierstrass-type Result

Realising Continuous Functions Achieves Hydrodynamic Limit with Weierstrass-type Result

January 19, 2026