The demand for increasingly complex data processing necessitates innovations in computational architectures, particularly in the realm of artificial intelligence and machine learning. Current von Neumann architectures, where data and processing units are physically separated, present bottlenecks in speed and energy efficiency. Researchers are actively exploring ‘in-memory computing’ paradigms, integrating computation directly within the memory itself, to overcome these limitations. M. A. Karamuftuoglu, C. Song, and colleagues from the University of Southern California’s Ming Hsieh Department of Electrical and Computer Engineering, detail a novel approach to in-memory matrix-vector multiplication (MVM) utilising superconducting circuits in their article, “Optimized Bistable Vortex Memory Arrays for Superconducting In-Memory Matrix-Vector Multiplication”.
Conventional computing architectures increasingly struggle to meet the demands of growing processing power requirements, particularly within data-intensive applications such as artificial intelligence and machine learning. The von Neumann architecture, characterised by its separation of processing and memory, creates a bottleneck due to the time and energy consumed when transferring data between these units, limiting overall system performance. Superconducting electronics offer a potential solution by providing significantly faster switching speeds and lower power consumption compared to conventional CMOS technology, driving research into alternative computing paradigms leveraging superconducting circuits like Rapid Single Flux Quantum (RSFQ) and Adiabatic Quantum Flux Parametron (AQFP) logic.
Current superconductor designs often rely on pipelined digital logic for operations like multiplication, achieving high throughput but at the cost of increased complexity and area, hindering scalability. These limitations motivate the exploration of novel architectures that minimise data movement and maximise computational density, paving the way for more efficient and scalable computing systems. Matrix-vector multiplication is a fundamental operation in many algorithms, including neural networks and signal processing, and is particularly demanding in terms of computational resources.
In-memory computing, which integrates computation directly within the memory array, offers a promising approach to overcome these challenges by eliminating the data transfer bottleneck and enabling more efficient and energy-friendly computation. Recent in-memory computing research follows two main paths, volatile and non-volatile approaches, each with its own advantages and disadvantages.
This work builds upon a novel non-volatile memory technology called Bistable Vortex Memory (BVM), which offers high density, scalability, and the potential for ultra-fast operation, positioning it as a promising candidate for in-memory computing applications. By leveraging the inherent properties of BVM, researchers aim to create a highly efficient and scalable architecture for in-memory matrix-vector multiplication, addressing the limitations of conventional computing systems.
The core of this innovation lies in the unique properties of BVMs, which are non-volatile memory elements capable of storing information using persistent circulating currents, or vortices. These vortices exhibit bistability, meaning they can exist in one of two stable states, representing binary data, and enabling robust and reliable data storage. By arranging BVMs into arrays, researchers can leverage their inherent current summation capabilities to perform multiplication, streamlining the computational process.
The design employs a tiled multiplier structure where the BVM array accumulates current proportional to the product of input values, enhancing computational density and efficiency. Crucially, the accumulated analog current is then converted into digital signals using Quantizer Buffer cells, which generate a variable number of Single Flux Quantum (SFQ) pulses, representing the fundamental unit of digital information in superconducting circuits. To complete the multiplication process, these SFQ pulses are processed by T1 adder cells, which perform binary addition and carry propagation, completing the arithmetic operation.
By arranging these BVM-based multipliers in a systolic array configuration, researchers achieve parallel processing, significantly accelerating the matrix-vector multiplication and improving overall system performance. A key refinement of this design involves optimising the BVM array structure specifically for multiplication applications, enhancing efficiency and reducing area. This optimisation includes restructuring the Sense Lines, which are the connections used to read data from the BVMs, with diagonal connections to reduce the overall area of the circuit and improve signal integrity. Furthermore, the input scheme is adjusted to enhance efficiency compared to general-purpose BVM array designs, streamlining the data flow and reducing computational overhead.
The efficacy of this approach is demonstrated through simulations of a 4-bit multiplier operating at an impressive 20 GHz with a latency of just 50 picoseconds, validating the potential for ultra-fast computation. The researchers also showcase how this multiplier design can be extended to support Multiply-Accumulate (MAC) operations, which are even more prevalent in neural networks, expanding its applicability and versatility. This work paves the way for the development of power-efficient neural networks by enabling high-speed in-memory computing, potentially overcoming the energy bottlenecks that currently limit the scalability of artificial intelligence applications.
This research presents a novel architecture for matrix-vector multiplication utilising bistable vortex memory, a superconducting technology offering high density and scalability, and addressing the growing demands of data-intensive applications. Building upon prior work establishing BVM as a non-volatile memory, this study addresses computational demands within data-driven algorithms and neural networks by performing arithmetic directly within the memory itself.
BVM arrays function by converting accumulated analog current into a variable number of single flux quanta (SFQ) pulses, representing digital information, and enabling robust and reliable data processing. These SFQ pulses then propagate through T1 adder cells, which perform binary addition and carry propagation, completing the multiplication process and delivering accurate results.
Simulations demonstrate a 4-bit multiplier operating at 20 GHz with a latency of 50 picoseconds, validating the potential for ultra-fast computation and demonstrating the effectiveness of the proposed architecture. The researchers further demonstrate the architecture’s capability to perform multiply-accumulate (MAC) operations, essential for many machine learning algorithms, expanding its applicability and versatility.
This MVM structure also operates at 20 GHz, showcasing the potential for parallel processing and enhancing overall system performance. The presented work establishes a pathway towards power-efficient neural networks by enabling high-speed in-memory computation, addressing the energy bottlenecks that currently limit the scalability of artificial intelligence applications. By performing calculations directly within the memory array, the architecture minimises data movement, a significant source of energy consumption in conventional computing systems, and enhancing overall energy efficiency.
Future work focuses on scaling the BVM array to larger dimensions and exploring different BVM cell designs to further optimise performance and density, paving the way for more powerful and efficient computing systems. Investigating the integration of this multiplier with other superconducting logic circuits will also be crucial for building complete, high-performance computing systems, and enabling the development of advanced computing platforms. Furthermore, exploring the application of this architecture to more complex workloads beyond matrix-vector multiplication will broaden its impact and demonstrate its versatility.
👉 More information
🗞 Optimized Bistable Vortex Memory Arrays for Superconducting In-Memory Matrix-Vector Multiplication
🧠 DOI: https://doi.org/10.48550/arXiv.2507.04648
