Ruo Cheng Huang and colleagues, from the School of Physical and Mathematical Sciences, modelled a classical agent extracting thermodynamic work through adaptive strategies and continuous inference. The decision-theoretic framework harnesses memory effects, enabling adaptive strategies to exceed the limitations of non-adaptive approaches. The team formalised this with the Time-Ordered Free Energy, a new upper bound for causal operations, and revealed a thermodynamic gap connected to adaptive ordered discord. Furthermore, an agent’s ability to simultaneously learn an unknown quantum state and extract work using reinforcement learning achieves sharply reduced cumulative dissipation compared to standard methods, providing a foundation for predictive and learning-based quantum thermodynamics

Adaptive quantum work extraction bypasses full state characterisation through temporal correlations

Cumulative dissipation now scales polylogarithmically, a substantial improvement over standard tomography-based work extraction methods. This represents a crossing of a critical threshold, previously preventing efficient work extraction from quantum systems without complete state knowledge. Conventional approaches to quantum thermodynamics necessitate full system characterisation, a process known as quantum state tomography, which becomes increasingly impractical and resource-intensive as the complexity of the quantum system increases. The number of measurements required for complete tomography grows exponentially with the number of qubits, rendering it infeasible for even moderately sized systems. An agent, a classical system deliberately designed without quantum memory, achieves this improved efficiency by exploiting temporal correlations, the inherent relationships between a quantum system’s states as it evolves over time, and employing adaptive strategies to refine its understanding of the system’s evolution. This adaptive behaviour allows the agent to focus on the most informative aspects of the system’s dynamics, avoiding the need to characterise the entire state space.

The new framework, formalised with the Time-Ordered Free Energy, establishes a theoretical limit for such causal operations, actions performed on a quantum system that respect the laws of causality, and highlights a thermodynamic gap linked to adaptive ordered discord, paving the way for predictive quantum thermodynamics. Adaptive ordered discord quantifies the degree of non-classical correlations that can be harnessed for work extraction through adaptive strategies. Demonstrating the agent’s ability to learn an unknown quantum state while simultaneously extracting work was a key achievement, a feat previously requiring complete system characterisation. Cumulative dissipation scaled polylogarithmically, meaning the energy lost during work extraction diminished at a rate faster than any logarithm; specifically, the dissipation scales as $O(log^k(n))$ where ‘n’ represents a measure of system complexity and ‘k’ is a constant. This contrasts sharply with traditional methods reliant on full quantum tomography, where dissipation typically scales linearly or polynomially with system size. This polylogarithmic scaling signifies a fundamental shift in the efficiency of quantum work extraction.

Employing multi-armed bandit algorithms, a technique borrowed from reinforcement learning, allowed the agent, operating without quantum memory, to optimise its work extraction strategy over time. These algorithms enable the agent to balance exploration, trying different actions to gather information, and exploitation, choosing the action that currently yields the most work. This adaptive approach surpassed the limitations of non-adaptive methods, revealing a thermodynamic gap linked to ‘adaptive ordered discord’, a measure of how much useful work can be obtained from the system’s temporal behaviour. Currently, however, these results focus on simplified, independent and identically distributed quantum states, and do not yet demonstrate performance with complex, interacting quantum systems found in real-world applications. The independent and identically distributed assumption simplifies the analysis but limits the immediate applicability of the findings.

Exploiting quantum fluctuations with reinforcement learning bypasses the need for complete system

Harnessing energy from quantum systems has long relied on fully understanding their state, a demanding task as systems grow more complex. The inherent uncertainty of quantum mechanics, coupled with the exponential growth of the state space, makes complete characterisation a significant hurdle. This research offers a compelling alternative, demonstrating that a simple, classical agent can extract work by adapting to the natural fluctuations within these systems, without needing detailed prior knowledge. The agent effectively learns to ‘predict’ the system’s behaviour based on observed correlations, allowing it to anticipate opportunities for work extraction. The current findings, however, are built upon a specific, simplified model; the agent operates effectively with independent and identically distributed quantum states, a condition rarely met in practical applications. This simplification allows for a rigorous theoretical analysis but represents a limitation for immediate implementation in more realistic scenarios.

It is important to acknowledge that these demonstrations rely on simplified quantum states; real-world quantum systems are rarely so neatly arranged. Interactions between quantum constituents, environmental noise, and complex dynamics introduce significant challenges. Nevertheless, this work establishes a key principle: extracting useful energy doesn’t always require exhaustive knowledge of a quantum system’s inner workings. The ability to circumvent the need for full state characterisation opens up new possibilities for designing energy harvesting devices based on quantum principles. Utilising reinforcement learning, this adaptive approach offers a pathway towards practical quantum technologies where complete control is unattainable, potentially revolutionising energy harvesting at the nanoscale. Applications could include powering micro-sensors or enabling novel quantum devices with reduced energy consumption.

Work at the forefront of the field demonstrates that energy can be harvested from quantum systems without fully knowing their state. This adaptive approach uses reinforcement learning, allowing a classical agent to respond to natural fluctuations and extract work efficiently. Establishing a new approach to harvesting energy from quantum systems, this research demonstrates work extraction without requiring complete knowledge of the system’s state. A classical agent, employing adaptive strategies and reinforcement learning, successfully exploits temporal correlations, the way a quantum system changes over time, to generate thermodynamic work. This agent’s performance, limited by a newly defined theoretical bound called Time-Ordered Free Energy, achieves a sharply reduced rate of energy loss, scaling polylogarithmically with system complexity. Future research will focus on extending this framework to more complex quantum systems and exploring the potential for implementing these adaptive strategies in physical devices, bringing us closer to realising the promise of practical quantum thermodynamics.

This research demonstrated that thermodynamic work can be extracted from quantum systems using adaptive strategies and reinforcement learning. It establishes that useful energy can be harvested without needing complete knowledge of a quantum system’s state, offering a pathway towards practical quantum technologies. By employing multi-armed bandit algorithms, the agent achieved polylogarithmic cumulative dissipation, significantly reducing energy loss during work extraction. The authors intend to extend this framework to more complex quantum systems and explore physical implementations of these adaptive strategies.

👉 More information
🗞 A Demon that remembers: An agential approach towards quantum thermodynamics of temporal correlations
🧠 ArXiv: https://arxiv.org/abs/2604.04462

Tags:

quantum thermodynamics Reinforcement Learning