Bellman Memory Units Enable Synaptic Reinforcement Learning with Evolving Network Topologies on Intel’s Loihi Chip

Developing adaptable and efficient control systems for edge devices presents a significant challenge, particularly when limited by constraints on learning and hardware scalability. Shreyan Banerjee, Aasifa Rounak, and Vikram Pakrashi, from the UCD Centre for Mechanics, Dynamical Systems and Risk Laboratory at University College Dublin, address this problem by introducing a novel neuromorphic framework called Bellman Memory Units. Their research incorporates the principles of the Bellman equation directly into the synaptic level of a reinforcement learning algorithm, enabling the network topology to evolve iteratively during training. By implementing this approach on Intel’s Loihi chip, the team demonstrates a method for optimising neuron and synapse numbers, potentially leading to more compact and resource-efficient spike-based reinforcement learning accelerators and facilitating on-chip adaptation to previously unseen control scenarios.

Researchers engineered a synaptic Q-learning algorithm where the principles of the Bellman equation are incorporated into the behaviour of individual synapses within a neural network, allowing the network topology to evolve iteratively throughout the training process. The team developed a neuromorphic architecture that models synapses as RC series circuits, effectively convolving input spike trains with a time-dependent kernel to produce an analog post-synaptic current.

This approach allows each synapse to implement a non-linear learning rule, crucial for training spiking neural networks for control applications. Researchers chose Q-learning, a model-free, off-policy reinforcement learning algorithm, due to its simplicity and reliance on a single neural network, using a reward signal to optimise the controller’s policy and maximise cumulative future rewards as defined by the Bellman equations. To test this approach, the team applied their reinforcement learning-based control system to the classic cartpole-balancing problem using the CartPole-v1 simulator, balancing a pole subject to small angular perturbations by applying a force to a moving cart. The state space of the cartpole system was discretised, mapping continuous states such as cart position and pole angle to integer values, with each simulation episode concluding when the pole angle exceeds a threshold, the cart position exceeds a limit, or the cartpole is successfully balanced for 250 time steps. This implementation demonstrates the potential for resource reduction on board edge devices, aiding the manufacturing of compact, application-specific integrated circuits and enabling adaptation to unseen control scenarios.

Synaptic Q-Learning Balances Cartpole with Fewer Resources

Scientists achieved successful implementation of a synaptic Q-learning algorithm for controlling the classic Cartpole balancing problem, demonstrating a novel approach to reinforcement learning on neuromorphic hardware. The research team developed a system where the Bellman equations, central to Q-learning, are directly incorporated at the synaptic level of a neural network, enabling iterative evolution of the network’s topology during training. This innovative design allows the architecture to adapt and optimise itself, balancing the number of neurons and synapses to enhance performance in spike-based reinforcement learning accelerators, and experiments revealed that the proposed architecture can significantly reduce resource utilisation, paving the way for the creation of compact, application-specific integrated circuits. The system operates by spawning a neuron for each state encountered, with synapses representing possible actions and their associated Q-values, emitting spike trains with frequencies proportional to the Q-value, effectively selecting the optimal action. Measurements confirm that the algorithm converges when the total reward reaches 200 for at least 20 consecutive episodes, with each episode terminating when the pole falls over or the cartpole is balanced for 250 steps, and a penalty is assigned when the pole falls, accelerating the learning process. The discretization of the Cartpole state space into bins facilitated the implementation, demonstrating the potential for adapting to unseen control scenarios through on-chip learning, delivering a promising pathway towards low-power, scalable control systems for edge devices.

On-Chip Learning with Evolving Synapses

This work presents a novel synaptic Q-learning algorithm and its implementation on neuromorphic hardware, specifically Intel’s Loihi chip, to address limitations in edge device control systems. Researchers successfully integrated the Bellman equations directly into the synaptic level of a neural network, enabling the network topology to evolve iteratively during the training process, leveraging the inherent parallelism of neuromorphic computing to optimise resource utilisation and potentially reduce the size and power consumption of application-specific integrated circuits. The team demonstrated the algorithm’s effectiveness by applying it to the classic cartpole-balancing problem, achieving on-chip learning that allows adaptation to previously unseen control scenarios, offering a pathway towards more efficient and adaptable edge computing solutions.

👉 More information
🗞 Bellman Memory Units: A neuromorphic framework for synaptic reinforcement learning with an evolving network topology
🧠 ArXiv: https://arxiv.org/abs/2511.16066

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Renormalization Group Flow Irreversibility Enables Constraints on Effective Spatial Dimensionality

Renormalization Group Flow Irreversibility Enables Constraints on Effective Spatial Dimensionality

December 20, 2025
Replica Keldysh Field Theory Unifies Quantum-Jump Processes in Bosonic and Fermionic Systems

Replica Keldysh Field Theory Unifies Quantum-Jump Processes in Bosonic and Fermionic Systems

December 20, 2025
Quantum Resource Theory Achieves a Unified Operadic Foundation with Multicategorical Adjoints

Quantum Resource Theory Achieves a Unified Operadic Foundation with Multicategorical Adjoints

December 20, 2025