Researchers are increasingly focused on the limitations of current field-programmable gate array (FPGA) accelerators, which typically rely on pre-trained models for low-latency inference. Duc Hoang from the Massachusetts Institute of Technology, alongside colleagues, demonstrate the critical need to integrate learning directly into FPGA fabric to overcome these constraints. Their work advocates for ultrafast on-chip learning, enabling both inference and training to occur with sub-microsecond latency. This advancement is significant because it promises to unlock closed-loop systems capable of adapting to dynamic environments at the speed of the physical processes they govern, with potential applications ranging from cryogenic qubit calibration to autonomous scientific experimentation and ultimately transforming FPGAs into real-time learning machines.
Ultrafast on-chip learning for real-time FPGA acceleration enables low-latency inference
Scientists are advocating for a fundamental shift in FPGA accelerator design, moving beyond static inference to incorporate ultrafast on-chip learning. Current domain-specialized FPGAs excel at low-latency inference for scientific and industrial applications, but rely on models trained offline on CPUs or GPUs.
This separation creates a bottleneck in dynamic, high-frequency environments where model adaptation must occur at the timescale of the underlying physical processes. The research proposes integrating both inference and training directly within the FPGA fabric, achieving deterministic, sub-microsecond latency for both operations.
This innovation would enable closed-loop systems capable of adapting as rapidly as the phenomena they control, with potential applications in areas such as quantum error correction, cryogenic qubit calibration, plasma and fusion control, and autonomous scientific experiments. This work details a vision for transforming FPGAs from static inference engines into real-time learning machines, necessitating a co-design of algorithms, architectures, and toolflows.
Existing learning methods, optimised for offline training, assume abundant memory and relaxed timing constraints, which are incompatible with hard real-time requirements. The proposed approach prioritises minimising latency, memory footprint, and worst-case update cost, enabling continuous learning from streaming data and immediate reaction to dynamic changes.
This represents a departure from conventional optimisation goals focused on throughput or statistical efficiency in large batches. The study highlights a critical gap in current quantum systems, where continuous calibration is essential due to environmental drifts and device fluctuations. Existing calibration methods are often episodic and host-driven, introducing unacceptable latency for sustained, autonomous operation.
Implementing reinforcement learning on host processors reintroduces non-deterministic delays, hindering fast stabilisation. The research posits that on-chip learning is crucial for quantum control, allowing systems to learn continuously from streaming data and respond at the timescale of the underlying quantum dynamics, a feat unattainable with today’s periodically retrained models. This advancement promises to bridge the gap between the demands of large-scale quantum machines and the capabilities of current control stacks.
FPGA-embedded reinforcement learning for deterministic quantum control optimisation offers a promising path towards real-time applications
A 72-qubit superconducting processor serves as the foundation for demonstrating ultrafast on-chip learning, shifting the paradigm from static models to adaptive systems. This research prioritises integrating learning directly into the FPGA fabric, enabling both inference and training to occur with deterministic, sub-microsecond latency.
The study addresses a critical systems-level gap in quantum computing, where continual calibration is essential for maintaining performance amidst non-stationary hardware parameters and environmental drifts. Rather than relying on conventional host-driven training, the work implements reinforcement learning (RL) within the FPGA to optimise control actions from streaming reward signals without requiring an explicit device model.
This approach circumvents the non-deterministic software and interconnect latencies that hinder fast stabilisation, effectively eliminating the separation between learning and execution. The methodology focuses on fixed-precision computation and strict memory constraints, prioritising minimal latency and worst-case update cost over statistical efficiency.
Specifically, the research targets a calibration loop operating at 1μs, a timescale below the representative qubit decoherence time of approximately 10μs in silicon qubits. This rapid feedback is designed to average over slow charge noise while reacting to higher-frequency fluctuations, potentially increasing qubit coherence time significantly.
The study leverages on-chip function approximators alongside RL-style closed-loop control to achieve this, creating a system capable of continuous learning from streaming data and reacting at the timescale of the underlying physical dynamics. This methodology represents a departure from periodically retrained models, enabling adaptive computation unattainable with current systems.
Deterministic on-chip learning for continuous-time quantum processor calibration enables automated and precise control parameter optimization
Researchers are advocating a shift towards ultrafast on-chip learning within field-programmable gate arrays, enabling both inference and gradient-based updates to occur directly on streaming data with deterministic, sub-microsecond latency. This approach aims to integrate learning into the hardware datapath, allowing adaptive systems to respond at the timescale of the physical processes they control.
The work focuses on minimizing latency, memory footprint, and worst-case update cost, rather than maximizing throughput or statistical efficiency in large batches. Quantum processors, being analog control systems, require continual calibration to maintain performance due to environmental drifts and device fluctuations.
Current calibration methods are often episodic and host-driven, involving data transfer to CPUs or GPUs for processing before updating control settings. This introduces delays that are incompatible with continuous runtimes needed for fault-tolerant algorithms, creating a systems-level gap in quantum machine control.
Reinforcement learning offers a potential solution, but its implementation on a host reintroduces non-deterministic latencies. The research highlights the opportunity to replace slow outer loops with on-chip learning controllers that ingest streaming measurements and apply corrective actions with deterministic, fast feedback.
A target calibration loop time of 1μs is proposed, operating below the qubit decoherence timescale of approximately 10μs in silicon qubits. In semiconductor quantum dot platforms, charge rearrangements can degrade charge-sensing contrast, necessitating frequent re-centering of the operating point. Existing methods rely on manual tuning or intermittent host-driven optimization, which do not scale with device complexity.
Ultrafast on-chip learning, combining reinforcement learning with stable, fixed-precision function approximators, could enable calibration and stabilization cycles at sub-microsecond latency. This has the potential to average over slow charge noise and react to higher-frequency fluctuations, potentially increasing qubit coherence time by orders of magnitude.
Ultrafast on-chip learning for real-time adaptive systems requires novel device paradigms
Domain-specialized field-programmable gate arrays (FPGAs) have demonstrated high performance for rapid inference in both scientific and industrial applications, but current systems typically rely on static models trained offline, with learning relegated to slower central processing units or graphics processing units. This separation limits the capabilities of systems operating in dynamic environments where model updates must align with the timescale of the underlying physical processes.
This work advocates for a transition towards ultrafast on-chip learning, integrating both inference and training directly within the FPGA fabric with deterministic, sub-microsecond latency. Enabling inference and learning within the same real-time datapath would facilitate closed-loop systems capable of adapting as quickly as the physical processes they control, with potential applications including cryogenic qubit calibration, plasma and fusion control, accelerator tuning, and autonomous scientific experimentation.
Realising this potential necessitates a combined rethinking of algorithms, architectures, and computer-aided design (CAD) toolflows to transform FPGAs from static inference engines into real-time learning machines. Current CAD tools are optimised for static datapaths and forward-only computations, posing a challenge for on-chip learning which requires support for stateful designs with tightly bounded latency, including explicit update logic, persistent parameter storage, and guaranteed worst-case scheduling under streaming input/output.
The key requirement is tool support to co-optimise compute, memory, and control to ensure updates meet strict real-time timing and resource constraints. While FPGAs have proven capable of delivering deterministic, nanosecond-scale inference, existing accelerators are limited by the assumption that learning occurs off-chip.
For emerging non-stationary systems, particularly in quantum control and other high-frequency scientific workloads, this separation hinders closed-loop adaptation, making it both slow and unpredictable. Future research should focus on learning algorithms stable in fixed precision, architectures that efficiently manage state updates without compromising determinism, and CAD toolflows that treat continuous, stateful learning as a primary compilation target. If successful, this shift would enable adaptive instruments and controllers capable of tracking rapidly evolving dynamics.
👉 More information
🗞 Position: The Need for Ultrafast Training
🧠 ArXiv: https://arxiv.org/abs/2602.02005
