Researchers are tackling the challenge of controlling high-speed robotics using biologically plausible neural networks. Irene Ambrosini, Ingo Blakowski, and Dmitrii Zendrikov, from the Institute of Neuroinformatics, UZH and ETH Zurich, alongside Cristiano Capone and colleagues, demonstrate a novel approach by training a network of slow silicon neurons to play air hockey. Their work is significant because it achieves real-time learning and successful robotic control with a remarkably small number of training trials, utilising a co-designed hardware and learning algorithm. This research bridges the gap between neuroscience-inspired computing and practical robotic systems, suggesting brain-inspired methods can effectively manage fast-paced interactions and enable continuous learning in intelligent machines.
This research bridges the gap between neuroscience-inspired computing and practical robotic systems, suggesting brain-inspired methods can effectively manage fast-paced interactions and enable continuous learning in intelligent machines.
Spiking network learns air hockey in real-time using
This breakthrough establishes real-time learning within a setup comprising a computer and the neuromorphic chip in-the-loop, enabling practical training of Spiking neural networks for robotic autonomous systems. The study unveils a bridge between neuroscience-inspired hardware and real-world robotic control, proving that brain-inspired approaches can effectively address fast-paced interaction tasks. Furthermore, the research supports always-on learning in intelligent machines, potentially revolutionising how robots adapt and operate in dynamic environments. The system operates with a 6D continuous state space, including puck position, velocity, and striker coordinates across a 1.038x 1.948m workspace, presenting a significant leap beyond simplified benchmarks.
The work addresses key scalability challenges by moving beyond toy Reinforcement learning problems to a physical robotic platform with adaptive-precision continuous state encoding. Researchers demonstrated neuromorphic reinforcement learning for continuous motion primitives executing ballistic trajectories at 50Hz, demanding predictive decisions rather than frame-level responses. By randomizing puck positions and velocities ranging from 1.0, 1.5m/s, the system achieved 96, 98% success over 2000 episodes, showcasing robust learning and adaptation capabilities. Experiments show the system’s ability to handle the higher dimensionality, physical constraints, and temporal dynamics inherent in real-world robotic control. The platform utilises an anthropomorphic arm on a standard air-hockey table, introducing larger workspaces and increased kinematic complexity. This research complements existing neuromorphic robotics efforts focused on event-based vision and spiking convolutional neural networks, paving the way for efficient event-driven perception coupled with adaptive decision-making in autonomous systems.
Spiking networks and reinforcement learning for air hockey
This setup, comprising the chip in-the-loop, enables practical training of spiking neural networks for robotic autonomous systems, bridging neuroscience-inspired hardware with real-world robotic control. The study pioneered biologically plausible “awake” and “dreaming” reinforcement learning phases, initially demonstrated on Atari Pong and subsequently extended to real-time hardware on the DYNAP-SE chip. At the core of this advancement are spiking neural networks, which model neurons as leaky integrate-and-fire units communicating through discrete spikes, a mechanism crucial for energy efficiency and temporal coding. Researchers implemented deep reinforcement learning algorithms, including DQN and TD3, within these spiking neural networks, but moved beyond reliance on non-local learning rules by adopting recent advances in local plasticity.
This enabled online learning in recurrent SNNs suitable for neuromorphic hardware, a critical step towards scaling to more complex tasks. Experiments employ the MuJoCo implementation of the Air Hockey environment, featuring a planar table and an anthropomorphic manipulator controlling a mallet-shaped end-effector. The agent observes the puck’s 2D position and velocity, alongside its end-effector position, and must intercept the puck sliding across a 1.038 × 1.948m workspace. The control loop operates at 50Hz, with the agent selecting one of two discrete actions corresponding to pre-defined motion primitives executed as open-loop trajectories using spline velocity profiles.
This methodology allows for rigorous testing of the neuromorphic hardware’s capabilities in a dynamic, closed-loop robotic control application, achieving 96, 98% success over 2000 episodes with randomized puck positions and velocities ranging from 1.0, 1.5m/s. The team extended a neuromorphic RL framework originally demonstrated on Atari Pong to the challenging domain of physical robot manipulation, addressing scalability issues by moving beyond toy problems to real-world air hockey with adaptive-precision continuous state encoding. They demonstrated neuromorphic RL for continuous motion primitives executing ballistic trajectories at 50Hz, requiring predictive decisions rather than frame-level responses, and achieved generalization under uncertainty through randomization of puck parameters.
Robotic Air Hockey Learns via Spiking Neurons
The research successfully scales a neuromorphic reinforcement learning framework from a 2D pixel game to a physical 3D robotic task, increasing network input dimensionality from 4 to 6 inputs and adapting control from single-step actions to composed motion primitives. Experiments revealed a 100% success rate within 200 trials for a stationary puck positioned 1.0m from the robot, establishing a baseline for performance evaluation. The team measured task-level generalization under varying initial puck conditions, achieving 100% success after 1000 episodes with a constant-speed lateral launch from the table edge. Introducing speed variability, with velocities ranging from 1.0 to 1.5m/s, extended learning time, but success stabilized above 96% after 1500 episodes.
Randomizing both initial position (within a 0.10m window) and speed yielded the highest asymptotic performance, exceeding 98% after 1300 episodes, suggesting the broader state distribution prevented overfitting. These results demonstrate the viability of event-driven e-prop and reservoir architectures for low-power, predictive control in fast real-world robotics. Encoding-range scalability tests, conducted with 1020 silicon neurons, showed a narrow velocity range of [0.7, 0.9] m/s achieved greater than 97% success within approximately 150 episodes. A medium range of [0.7, 1.2] m/s required around 700 episodes to reach similar performance, while the widest range of [0.7, 1.5] m/s resulted in a modest 4% drop in asymptotic success, from 97% to 93%.
Data shows that wider input ranges increase convergence time and slightly reduce performance, consistent with the finite resolution of the fixed-size network. Before training, the agent exhibited near-random action selection with high variability, but learning transformed this stochastic exploration into a deterministic, temporally precise strategy. This consolidation minimized timing variance and enabled robust generalization across varied initial conditions, demonstrating the network’s capacity to extract coherent, reliable policies from noisy state observations. The work achieved stable performance within 1500, 2000 episodes using 1020 DYNAP-SE neurons, surpassing simulation-based bio-inspired RL which utilized 10,000 neurons.
Neuromorphic Air Hockey Control with 1020 Neurons demonstrates
Scientists have demonstrated successful robotic manipulation using a small spiking neural network comprising only 1020 neurons. The research bridges neuroscience-inspired hardware with real-world robotic control, showcasing the potential of brain-inspired approaches for fast-paced interaction tasks. Notably, the network achieved better performance than simulations using ten times the number of neurons, highlighting the efficiency of the neuromorphic hardware. However, the authors acknowledge limitations related to the fixed size of the silicon network, which currently constrains the range of puck velocities it can reliably handle, with success rates decreasing from 97% to 86% as velocity ranges broaden. Future work could address this by utilising all available processor cores, implementing offline learning mechanisms, or integrating event-camera input for improved latency and robustness, potentially validating the system on a platform like iCub under real-world conditions.
👉 More information
🗞 Training slow silicon neurons to control extremely fast robots with spiking reinforcement learning
🧠 ArXiv: https://arxiv.org/abs/2601.21548
