Embodied Artificial Intelligence is rapidly gaining prominence as it integrates AI with the physical world! Researchers Tong Xie from School of Integrated Circuits, Peking University, Yijiahao Qi from School of EECS, Peking University, and Jinqi Wen from School of EECS, Peking University, et al., have identified a critical challenge in deploying these systems: their substantial computational demands, particularly for portable, battery-powered devices! Their new work addresses the trade-off between energy efficiency and reliability, demonstrating that reducing operating voltage , a common energy-saving tactic , can introduce errors leading to task failure! The team proposes CREATE, a novel cross-layer resilience framework that synergistically optimises energy use and reliability, achieving up to 40.6% computational energy savings and a 37.3% reduction in chip-level energy consumption, ultimately extending battery life by up to 30%! This research marks a significant step towards practical, robust, and efficient embodied AI agents capable of operating in real-world environments.
Modern embodied AI frequently integrates a Large Language Model (LLM)-based planner for high-level task management with a reinforcement learning (RL)-based controller for precise action execution, enabling agents to navigate complex real-world scenarios. However, deploying these agents locally on battery-powered devices presents significant challenges due to substantial computational demands! The research team conducted a comprehensive error injection study on contemporary embodied AI systems, revealing an inherent, yet heterogeneous, fault tolerance.
Building upon this crucial insight, they developed an anomaly detection and clearance mechanism operating at the circuit level, effectively eliminating outlier errors before they propagate. Simultaneously, at the model level, the scientists proposed a weight-rotation-enhanced planning algorithm designed to bolster the fault tolerance of the LLM-based planner itself. This innovative approach ensures more robust task execution even in the presence of hardware imperfections. Furthermore, the study introduces an application-level technique called autonomy-adaptive voltage scaling, which dynamically adjusts the operating voltage of the controllers.
A co-designed voltage scaling circuit facilitates this online voltage adjustment, allowing for real-time optimisation of energy consumption. Extensive experimentation demonstrates that CREATE achieves an average of 40.6% computational energy savings compared to systems operating at nominal voltage, and a substantial 35.0% improvement over existing state-of-the-art techniques! This breakthrough translates into significant chip-level energy savings, ranging from 29.5% to 37.3%, and a corresponding 15% to 30% increase in battery life for embodied AI devices. The work establishes a cross-layer resilience characterisation and optimisation framework, paving the way for more sustainable and dependable AI-powered robots and autonomous systems operating in diverse environments. This research opens exciting possibilities for extending the operational lifespan and capabilities of embodied AI in applications ranging from industrial automation to search-and-rescue operations.
Error Injection Reveals Heterogeneous AI Resilience across diverse
Scientists pioneered CREATE, a design principle leveraging heterogeneous resilience for synergistic energy-reliability co-optimization in embodied AI systems! The research team first conducted a comprehensive error injection study on modern embodied AI, revealing inherent but heterogeneous fault tolerance across system layers. Experiments employed systematic error injection to characterise resilience, demonstrating that while both the LLM-based planner and reinforcement learning controller exhibit good error robustness at low bit error rates (BERs), specifically ≤10−7, the controller displays significantly higher resilience at elevated BERs, ranging from 10−7 to 10−3! The study meticulously analysed resilience variation within different network components of both the planner and controller, alongside dependencies on subtasks and execution status.
Researchers discovered that systematic activation outliers within the LLM are the primary cause of the planner’s diminished resilience at higher BERs, exacerbated by subsequent normalization operations. Conversely, the controller exhibited varying resilience patterns contingent on specific subtasks and action steps during task execution. Building upon these insights, the team engineered an anomaly detection and clearance mechanism at the circuit level to suppress large errors stemming from timing violations, establishing a robust foundation for further optimisation. To address persistent, smaller errors originating from the LLM planner’s activation distributions, scientists developed a weight-rotation-enhanced planning algorithm.
This technique redistributes activation patterns within the LLM, demonstrably improving robustness and performance. Furthermore, the research introduced autonomy-adaptive voltage scaling at the application level, dynamically adjusting the controller’s operating voltage based on the current subtask execution status. A custom circuit was co-designed to facilitate online voltage adjustment in systolic arrays and low-dropout regulators, enabling precise control. Extensive experiments confirmed that CREATE achieves an average of 40.6% computational energy savings compared to nominal-voltage baselines and 35.0% over prior-art techniques, without compromising task quality. This energy reduction translates to 29.5% to 37. Experiments demonstrated that the controller exhibits significantly higher resilience to bit errors, specifically at bit error rates ranging from 10−7 to 10−3, compared to the LLM-based planner. This disparity in resilience formed the foundation for CREATE’s synergistic, cross-layer optimization strategy.
The core of CREATE lies in three key innovations: anomaly detection and clearance at the circuit level, weight-rotation-enhanced planning at the model level, and autonomy-adaptive voltage scaling at the application level. At the circuit level, the team implemented a mechanism to suppress large errors induced by timing violations, establishing a robust foundation for further optimizations. Further analysis revealed that systematic activation outliers within the LLM planner, combined with normalization operations, contribute to its poor resilience at higher bit error rates. To address this, scientists proposed weight-rotation-enhanced planning, which redistributes LLM activations to improve robustness and maintain task quality.
Measurements confirm that CREATE achieves an average 40.6% improvement in computational energy savings compared to systems operating at nominal voltage. Furthermore, the new design principle surpasses prior-art techniques by 35.0% in energy efficiency! These gains translate directly into chip-level energy savings of 29.5% to 37.3% and a remarkable 15% to 30% improvement in battery life, all while maintaining iso-task quality. The team also customized a circuit for dynamic voltage scaling in systolic arrays and low-dropout regulators (LDOs) to holistically implement these optimizations. Researchers observed that the controller’s resilience is particularly strong across diverse subtasks and action steps during task execution.
The autonomy-adaptive voltage scaling technique dynamically adjusts the operating voltage of the controller based on the demands of the current subtask, maximizing efficiency. Tests prove that by aggressively lowering the operating voltage, CREATE delivers substantial energy reductions without sacrificing the reliability or performance of embodied AI agents in complex tasks0.6% computational energy savings compared to nominal-voltage baselines and 35.0% over existing techniques, translating to 29.5% to 37.3% chip-level energy savings and a 15% to 30% improvement in battery life! The authors acknowledge limitations including the use of a uniform error model and INT8 quantization, which may not fully capture the complexity of real-world error distributions! Their resilience characterization revealed the controller exhibits greater error tolerance than the planner, with the planner’s single invocation for multiple steps making it more susceptible to errors! Future research directions involve exploring more sophisticated error models and extending the application of autonomy-adaptive voltage scaling to other components within the embodied AI system! These findings represent a significant step towards enabling more efficient and robust embodied AI agents for real-world applications.
👉 More information
🗞 CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems
🧠 ArXiv: https://arxiv.org/abs/2601.14140
