Cryptographic operations underpin the security of increasingly prevalent Internet of Things devices and edge computing systems, yet current RISC-V platforms often struggle to deliver both comprehensive algorithm support and efficient hardware acceleration. Anh Kiet Pham from Nara Institute of Science and Technology, Van Truong Vo from University of Information Technology, Ho Chi Minh City, and Vu Trung Duong Le et al. address this challenge with Crypto-RV, a novel FPGA-based RISC-V co-processor. This research significantly advances the field by unifying support for a broad range of cryptographic algorithms, including SHA-256, AES-128, and post-quantum candidates like HARAKA-256, within a single, streamlined architecture. Implemented and evaluated on a Xilinx ZCU102 FPGA, Crypto-RV demonstrates substantial performance gains of up to 1,061x and improved energy efficiency of up to 17.4x compared to conventional CPUs, offering a viable solution for resource-constrained IoT applications.
The breakthrough lies in three key architectural innovations designed to maximise performance and efficiency.
These include a high-bandwidth internal buffer, capable of storing 128x 64-bit data blocks, alongside cryptography-specialized execution units featuring four-stage pipelined datapaths. Furthermore, the co-processor demonstrates significant gains in energy efficiency, exhibiting performance 5.8x to 17.4x better than conventional CPUs.
The compact design, occupying only 34,704 Look-Up Tables, 37,329 Flip-Flops, and 22 Block RAMs, confirms its suitability for resource-constrained environments. This internal buffer serves as high-bandwidth storage for constants and intermediate values, reducing reliance on external memory accesses. Custom data-movement instructions enable bulk transfers of up to 128 words, decoupling intra-round data reuse from slower off-chip communication.
The design achieves a 17.42x to 58.15x reduction in latency compared to baseline RISC-V cores by maintaining all intermediate states on-chip throughout round sequences. The Cryptography Specialized Unit comprises three unified engines executing multiple algorithms via deeply pipelined datapaths. A unified SM3/SHA-256/SHA-512 engine shares functional units across these algorithms, employing a Message Expander to process input words and a Message Compressor with shared adders and multiplexers.
This unit operates as a four-stage pipeline, processing one 1024-bit block per cycle in SHA-512 mode or two blocks in parallel for SHA-256 and SM3, maximizing resource utilization. The architecture incorporates a high-bandwidth internal buffer array of 128×64-bit, cryptography-specialized execution units utilizing four-stage pipelined datapaths, and a double-buffering mechanism with adaptive scheduling.
This combination optimizes performance for large-hash workloads and significantly reduces memory bottlenecks commonly found in conventional RISC-V cores. Specifically, the internal buffer reduces load/store operations, enabling crypto-specialized units to achieve one pipeline iteration per cycle after initial warm-up.
Latency reductions of 17.42times to 58.15times were observed compared to baseline RISC-V implementations, achieved through the efficient on-chip storage of intermediate states and message blocks. The cryptography specialized unit features unified engines for SM3/SHA-256/SHA-512 and AES-128/Haraka-256/Haraka-512, sharing functional units to minimize area overhead and maximize resource utilization.
The SM3/SHA-256/SHA-512 unit processes one 1024-bit block per cycle in SHA-512 mode, while the 32-bit datapath in SHA-256/SM3 mode processes two blocks in parallel. Energy efficiency is also substantially improved, with Crypto-RV demonstrating performance 5.8times to 17.4times better than powerful CPUs.
The design occupies a modest 34,704 LUTs, 37,329 FFs, and 22 BRAMs, confirming its viability for resource-constrained IoT environments. The design’s modest resource utilisation, 34,704 LUTs, 37,329 FFs, and 22 BRAMs, highlights its suitability for resource-constrained environments like IoT devices.
A consistent cycle count of 2.0 to 4.1 cycles per byte across all supported algorithms confirms the effectiveness of the unified architecture in shifting the performance bottleneck from memory access to computational throughput. The authors acknowledge that the current implementation focuses on hash and symmetric encryption algorithms.
Future research will concentrate on extending Crypto-RV to fully accelerate SPHINCS+ signature generation, incorporating specialized tree-hash cores and optimised Merkle-layer state management, thereby establishing a unified platform for quantum-safe IoT security. This work represents a significant step towards providing high-performance, energy-efficient cryptographic processing for emerging applications demanding robust security in constrained environments.
👉 More information
🗞 Crypto-RV: High-Efficiency FPGA-Based RISC-V Cryptographic Co-Processor for IoT Security
🧠 ArXiv: https://arxiv.org/abs/2602.04415
