Deploying artificial intelligence on tiny, power-sipping IoT devices presents a significant hurdle, often demanding bespoke hardware solutions for each AI model, and researchers led by Ajay Kumar M and Cian O’Mahoney from University College Dublin, along with colleagues including Pedro Kreutz Werle, have introduced a new framework called MARVEL to address this challenge. This automated system generates custom RISC-V processor extensions specifically tailored to classes of deep neural networks, with a focus on convolutional networks, enabling efficient AI deployment on severely resource-constrained devices. Unlike existing tools, MARVEL creates hardware and software that operates without needing an operating system or complex software dependencies, streamlining the process and reducing overhead, and the team demonstrates a two-fold increase in inference speed and a corresponding reduction in energy consumption, achieved with a modest increase in hardware area when implemented on a modern FPGA platform. This advancement promises to unlock the potential of AI in a wider range of embedded applications, from smart sensors to wearable devices, by overcoming the limitations of current deployment pipelines.
Existing accelerator-generation tools struggle to address the extreme resource limitations faced by IoT endpoints operating without an operating system. Consequently, this work investigates methods for automatically generating custom hardware accelerators optimised for both performance and resource utilisation, enabling the deployment of sophisticated AI capabilities on even the most limited IoT platforms.
AI Acceleration on RISC-V for Edge Computing
This research details work on AI-enhanced RISC-V cores, specifically for edge computing applications. The core focus lies in improving the performance and efficiency of deep learning models on resource-constrained edge devices. Researchers have developed a system-level framework, LiteAIR5, for designing, modeling, and evaluating AI-extended RISC-V cores, providing a valuable tool for researchers and developers working on edge AI applications. The work demonstrates performance improvements and enhanced energy efficiency compared to existing solutions.
Automated Hardware Acceleration for Deep Learning
Researchers have developed a new automated framework, MARVEL, that addresses the challenge of deploying deep neural networks on resource-constrained IoT devices. Unlike existing approaches, MARVEL generates custom RISC-V instruction set extensions tailored to specific classes of deep learning models, particularly convolutional neural networks. The system profiles high-level Python-based models and automatically creates both the specialized hardware and the necessary compiler tools for efficient execution, eliminating the need for an operating system. This end-to-end approach significantly improves performance and efficiency, achieving up to a two-fold increase in inference speed and a two-fold reduction in energy consumption when tested on a Zynq UltraScale+ FPGA platform.
While other methods focus on optimizing specific aspects, MARVEL uniquely combines hardware and software co-design through automated profiling and extension generation. Evaluations across a range of popular models, including LeNet-5, MobileNet, ResNet, and DenseNet, demonstrate the framework’s versatility and effectiveness. Although the specialized hardware introduces a 28. 23% increase in area, the substantial gains in speed and energy efficiency represent a significant advancement for edge AI applications where resources are severely limited.
Automated Hardware for Efficient Edge AI
This research presents MARVEL, an automated framework that generates custom RISC-V processor extensions specifically designed for convolutional neural networks (CNNs) and targeted at resource-constrained IoT devices. The framework bridges the gap between high-level Python-based AI models and low-level bare-metal C implementations, enabling efficient deployment without the need for an operating system. By automating the process of hardware customization, MARVEL achieves up to a two-fold improvement in inference speed and energy efficiency, with a modest 28. 23% increase in hardware area when implemented on a Zynq UltraScale+ FPGA platform. The authors plan future work to refine power estimation accuracy, explore alternative baseline RISC-V cores, and expand support for a wider range of deep learning models and quantization levels, further enhancing hardware-software co-design for edge AI.
👉 More information
🗞 MARVEL: An End-to-End Framework for Generating Model-Class Aware Custom RISC-V Extensions for Lightweight AI
🧠 ArXiv: https://arxiv.org/abs/2508.01800
