The increasing need to secure digital information has driven development of new cryptographic algorithms, but these advancements come with a performance cost, as they require significantly larger keys and signatures than current standards. Jingyao Zhang and Elaheh Sadredini, from the University of California, Riverside, and their colleagues address this challenge by exploring a novel computing architecture that moves processing closer to memory. Their work introduces Crypto-Near-Cache (CNC), a system that places processing elements directly adjacent to cache memory, dramatically increasing speed and reducing energy consumption. This innovative approach overcomes limitations in traditional computer architectures and promises to accelerate post-cryptographic algorithms, paving the way for more secure and efficient data processing in the future.
Post-Quantum Cryptography and Hardware Acceleration
Research in cryptography increasingly focuses on developing algorithms resistant to attacks from quantum computers, known as post-quantum cryptography. A significant body of work explores accelerating these new algorithms, alongside established methods like AES, using innovative hardware approaches. Many investigations center on in-memory and near-memory computing, techniques that move processing closer to the data to reduce bottlenecks and improve energy efficiency. This research addresses the growing need for secure communication in a future where quantum computers pose a threat to current encryption standards.
Studies demonstrate a clear trend towards optimizing cryptographic performance through specialized hardware, exploring various architectures and emerging non-volatile memory technologies to achieve faster and more efficient encryption and decryption. These efforts encompass both general hardware design principles and specific optimizations for cryptographic kernels, aiming to overcome limitations in traditional computing systems. Investigations also address broader security concerns, such as side-channel attacks and the need for privacy-preserving techniques. By combining algorithmic advancements with hardware acceleration, scientists aim to create robust and secure cryptographic systems capable of protecting sensitive data in an increasingly interconnected world.
Near-Cache Computing Accelerates Post-Quantum Cryptography
Scientists developed a novel near-cache computing paradigm, termed Crypto-Near-Cache (CNC), to accelerate post-quantum cryptographic algorithms. Recognizing that cache bandwidth often limits performance, the team engineered a system that places compute-enabled SRAM arrays directly adjacent to each cache slice. This innovative architecture achieves high internal bandwidth and minimizes data movement, unlike conventional designs, and pioneered a method of leveraging virtual addressing within this near-cache environment, ensuring seamless integration with existing computer systems. Researchers observed that input data for cryptographic workloads typically fits within cache blocks, enabling a key insight: the use of the same virtual address for both data movement and computation.
This approach avoids integration challenges while maintaining addressing transparency. The team implemented bit-parallel computing with flexible Computing Blocks (CBs) tailored to the precision requirements of different algorithms, ensuring high performance for critical cryptographic kernels. To realize this architecture, the study involved designing a system where compute-enabled SRAM arrays are positioned directly next to cache slices, effectively eliminating bottlenecks. This configuration leverages high internal bandwidth while simultaneously supporting virtual addressing through ISA extensions. Evaluations demonstrate significant improvements in energy efficiency and throughput compared to existing designs, establishing a new framework for cryptographic acceleration.
Crypto-Near-Cache Accelerates Post-Quantum Cryptography
Recent research delivers a novel near-cache computing paradigm, termed Crypto-Near-Cache (CNC), designed to accelerate post-quantum cryptographic algorithms and other demanding applications. This work addresses performance bottlenecks caused by limited cache bandwidth by integrating SRAM arrays with bitline computing capability directly near cache slices, achieving high internal bandwidth and short data movement with native support for virtual addressing. The CNC utilizes a physical array that is logically reconfigured to suit different algorithms, maximizing parallelism. Experiments demonstrate that the physical array can be partitioned to support a variety of cryptographic algorithms with 100% utilization.
For AES, the array is organized as computing blocks (CBs), enabling parallel operations. Keccak-1600 and NTT algorithms also utilize specific CB configurations to accommodate their data requirements, enabling parallel computation across an extended data width. The research team developed a command structure to control the CNC, facilitating operations such as data writing, logical operations, bitline computing, and bit shifts. Analysis reveals that shift operations account for a significant portion of computation in certain algorithms, and the team implemented near-zero-cost shifting techniques, alongside hardware-supported bit extension, to maximize performance and energy efficiency. Error detection and correction schemes are also incorporated, ensuring reliability.
Crypto Acceleration via Near-Cache Computation
This research presents Near-Cache-Slice Computing, termed Crypto-Near-Cache (CNC), as a novel architecture designed to accelerate both existing and future cryptographic algorithms. By integrating SRAM arrays with bitline computing capabilities directly adjacent to cache slices, the team achieves high internal bandwidth and minimizes data movement, while maintaining native support for virtual addressing. An instruction set architecture extension was also developed to facilitate seamless integration of CNC into existing systems. The core achievement lies in substantial gains in energy efficiency and throughput for cryptographic workloads, accomplished through careful design of the core and cache datapaths, coupled with algorithm-specific optimization techniques such as near-zero-cost shifting and efficient Galois Field conversions.
While the primary focus is performance and energy efficiency, the architecture also offers potential security benefits. The authors acknowledge that integrating CNC requires modifications to existing core and cache designs, necessitating thorough testing and verification. Future work will focus on extending the architecture to efficiently accelerate non-cryptographic workloads, a challenge requiring careful co-design of algorithms and hardware.
👉 More information
🗞 A Near-Cache Architectural Framework for Cryptographic Computing
🧠 ArXiv: https://arxiv.org/abs/2509.23179
