Scientists are increasingly focused on developing methods for ultrafast online learning, a critical capability for high-frequency applications including advanced computing controls and nuclear fusion systems. Duc Hoang, Aarush Gupta, and Philip Harris, all from MIT, detail a novel approach utilising Kolmogorov-Arnold Networks (KANs) that addresses the limitations of conventional Multi-Layer Perceptrons in low-latency, resource-constrained environments. Their research demonstrates that KANs, leveraging B-spline locality, offer sparse updates and inherent robustness to fixed-point quantization, enabling superior on-chip scaling. By implementing fixed-point online training on Field-Programmable Gate Arrays, the team showcases significantly improved efficiency and expressiveness compared to MLPs, and importantly, presents the first demonstration of model-free online learning achieved at sub-microsecond latencies.
Real-time adaptation using sparse Kolmogorov Arnold Networks on FPGAs enables efficient and flexible hardware acceleration
Scientists have achieved model-free online learning at sub-microsecond latencies, a breakthrough essential for high-frequency systems like quantum computing and nuclear fusion controls. These domains demand adaptation on timescales far exceeding the capabilities of conventional training methods, where adjustments must occur within fractions of a microsecond.
This work introduces a novel implementation of Kolmogorov, Arnold Networks (KANs) that overcomes limitations of existing approaches, enabling real-time adaptation directly on-chip. This sparsity allows for superior scaling of on-chip resources, reducing the computational burden and memory requirements. Furthermore, KANs exhibit inherent robustness to fixed-point quantization, a critical feature for low-precision computation on embedded systems.
By implementing this approach on FPGAs, a representative platform for on-chip computation, the study showcases a substantial reduction in both resource usage and latency. Detailed analysis reveals that, under equal parameter budgets and precision constraints, KANs outperform MLPs in numerical stability and update cost.
Figures demonstrate that while MLPs exhibit a linear increase in on-chip resources and latency with parameter count, KANs maintain near-constant resource usage and achieve sub-100ns latency. This advancement circumvents the limitations of traditional host, accelerator training loops, which are too slow for these rapidly evolving systems, and overcomes the instability issues associated with reduced-precision gradient-based optimization in MLPs.
This research establishes a foundation for truly real-time, adaptive control systems, opening possibilities for advancements in plasma diagnostics, high-speed communications, and quantum control. The demonstrated ability to perform both inference and training directly on-chip, with low-latency and fixed-precision computation, represents a significant step towards fully autonomous and responsive high-frequency systems.
Implementation of sparse KANs for fixed-precision online learning on FPGA and superconducting platforms offers significant energy and performance benefits
A 72-qubit superconducting processor forms the foundation of this research, enabling the investigation of ultrafast online learning essential for high-frequency systems. The methodology leveraged the inherent sparsity of KAN updates, exploiting B-spline locality to achieve superior on-chip resource scaling.
KAN layers apply learnable univariate spline maps, calculating new values based on weighted B-spline basis functions arranged on a grid. Each basis function possesses local support, meaning only a limited number of coefficients are non-zero for any given input coordinate, directly reducing per-sample update costs.
This contrasts with MLPs, which require dense gradient calculations across all parameters. To quantify this advantage, the work compared dense MLPs and KANs with an identical parameter budget and uniform fixed-point quantization. Researchers established that the update complexity for KANs is proportional to the number of active coefficients per edge, independent of total capacity, while MLPs require proportionally more arithmetic operations with increasing capacity.
Theoretical analysis demonstrated that increasing grid resolution in KANs improves approximation quality without increasing per-sample compute, a decoupling that benefits continual learning by preserving prior fits. Furthermore, the study highlighted KANs’ robustness to fixed-point quantization, stemming from their bounded activations and inherent magnitude normalization.
Unlike MLPs where outputs scale with inputs, KAN activations are convex combinations of learned coefficients, maintaining stability regardless of input magnitude. Parameter-matched Multi-Layer Perceptrons (MLPs) failed to track these drifts, exhibiting steadily increasing regret throughout the experiment.
A larger MLP configuration eventually recovered, but adaptation occurred more slowly, resulting in higher overall regret. Under fixed-point quantization with 2 integer bits, KAN maintained stability across varying total bitwidths, experiencing only mild degradation at the lowest precision levels. In stark contrast, the MLP exhibited a sharp precision cliff, where low bitwidths led to substantial final regret and high variance.
Increasing bitwidth was necessary to regain stable learning performance for the MLP architecture. Further analysis revealed that KAN performance benefited from increasing grid size until reaching a quantization-limited floor determined by bitwidth. Conversely, increasing the parameter count in MLPs yielded diminishing returns and amplified sensitivity to quantization, destabilizing learning at low precision.
For adaptive single-shot qubit readout, running accuracy exceeded 0.8 after 6000 time steps using quantized KAN, effectively tracking drift and avoiding the collapse observed in quantized MLPs. Learned decision boundaries at time step 6000 demonstrated KAN’s ability to track drifting IQ distributions despite substantial phase drift.
KAN performance improved monotonically with larger grid sizes, even under aggressive quantization, while increasing the number of parameters in MLPs provided only marginal accuracy gains and often destabilized convergence. Evaluating online policy optimization on Acrobot-v1, KAN sustained online improvement with fixed-point updates despite randomized link masses and lengths.
This contrasts with MLP baselines, where performance was less consistent, and a streaming value-learning baseline that lacked the adaptive capabilities of the KAN-based approach. These results demonstrate the efficacy of KANs in handling non-stationary dynamics and stochastic exploration within a fully online update framework.
KANs achieve efficient and robust online learning on FPGAs by leveraging parallelism and dedicated hardware resources
Researchers have demonstrated ultrafast online learning using Kolmogorov-Arnold Networks (KANs), achieving model-free adaptation at sub-microsecond latencies. This represents a significant advancement for high-frequency systems requiring rapid adaptation, such as those found in computing controls and nuclear fusion research.
The work establishes that KANs, when combined with B-spline locality, offer sparse updates and inherent robustness to fixed-point quantization, addressing limitations found in conventional Multi-Layer Perceptrons (MLPs). KAN-based online learners were implemented on Field-Programmable Gate Arrays (FPGAs) and shown to be more efficient and expressive than MLPs across resource-constrained tasks.
Specifically, the B-spline locality within KANs reduces computational demands and latency, maintaining both forward and backward passes within the sub-100 nanosecond range. Experiments involving digit classification with simulated sensor drift confirmed the ability of online KANs to adapt to changing data distributions, outperforming static, pretrained models.
Scalability tests further indicated that KANs can maintain ultrafast performance even with models exceeding 100,000 trainable parameters. The authors acknowledge that while KANs learn more slowly than some alternatives, they maintain stability under continued randomization. Future research may focus on further optimising learning speed while preserving the demonstrated robustness and efficiency. These findings suggest a clear path towards deploying adaptive, low-latency machine learning directly on-chip, enabling real-time control and adaptation in demanding applications.
👉 More information
🗞 Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks
🧠 ArXiv: https://arxiv.org/abs/2602.02056
