The paper discusses Analog-Based In-Memory Computing (AIMC) inference accelerators used for Deep Neural Network (DNN) inference workloads. It proposes two Post-Training (PT) optimization methods to improve accuracy and reduce complexity during training. These methods are applied to the pre-trained RoBERTa transformer model for all General Language Understanding Evaluation (GLUE) benchmark tasks. The paper also discusses Hardware-Aware (HWA) training techniques and the use of the GLUE benchmark and RoBERTa model. The study uses a hardware model for simulations based on an array containing 1 million Phase-Change Memory (PCM) devices.

Introduction to Analog-Based In-Memory Computing (AIMC) Accelerators

Analog-Based In-Memory Computing (AIMC) inference accelerators are used to efficiently execute Deep Neural Network (DNN) inference workloads. However, to mitigate accuracy losses due to circuit and device non-idealities, Hardware-Aware (HWA) training methodologies must be employed. These methodologies typically require significant information about the underlying hardware.

Post-Training Optimization Methods

In this paper, two Post-Training (PT) optimization methods are proposed to improve accuracy after training is performed. The first method optimizes the conductance range of each column in a crossbar, and the second optimizes the input, i.e., Digital-to-Analog Converter (DAC) range. It is demonstrated that when these methods are employed, the complexity during training and the amount of information about the underlying hardware can be reduced with no notable change in accuracy.

Application of PT Optimization Methods

The PT optimization methods are applied to the pretrained RoBERTa transformer model for all General Language Understanding Evaluation (GLUE) benchmark tasks. The results show that further optimizing learned parameters post-training improves accuracy.

Analog-Based In-Memory Computing (AIMC) Accelerators

AIMC accelerators are capable of performing Vector-Matrix Multiplications (VMMs) in O(1)-time complexity. They have gained significant interest due to their ability to execute these operations efficiently. However, networks trained to be deployed on traditional compute hardware with Floating-Point (FP) precision parameters, such as Graphics Processor Units (GPUs), require retraining when deployed on AIMC hardware to achieve near or at-ISO accuracy.

Hardware-Aware Training Techniques

Hardware-Aware (HWA) training techniques, such as Quantization-Aware Training (QAT), are widely adopted due to the proliferation of reduced precision digital accelerators and deterministic execution flows. However, some of these techniques require instance-specific information and cannot easily be generalized for different hardware architectures.

The GLUE Benchmark and RoBERTa

The GLUE benchmark is a collection of resources for training, evaluating, and analyzing natural language understanding systems. In this paper, the base RoBERTa model, a ubiquitous bidirectional transformer based on BERT with approximately 125M learnable parameters, is used.

Hardware Model for Simulations

To perform realistic hardware simulations, an experimentally verified model calibrated based on extensive measurements performed on an array containing 1 million Phase-Change Memory (PCM) devices is used. A tile size of 512×512 is assumed. Inputs are encoded using 8-bit Pulse Width Modulation (PWM) and weights are represented using a standard differential weight mapping scheme.

Improving the Accuracy of Analog-Based In-Memory Computing Accelerators Post-Training” by Corey Lammie, Athanasios Vasilopoulos, Julian Büchel, Giacomo Camposampiero, Manuel Le Gallo, Malte J. Rasch, Abu Sebastian. Published on January 18, 2024. https://doi.org/10.48550/arxiv.2401.09859

Tags:

AIMC Analog-Based In-Memory Computing conductance range crossbar DAC Deep Neural Network Digital-to-Analog Converter DNN Floating-Point precision General Language Understanding Evaluation GLUE GPUs Graphics Processor Units Hardware-Aware training inference workloads natural language understanding systems PCM Phase-Change Memory Post-Training optimization Pulse Width Modulation PWM. QAT Quantization-Aware Training RoBERTa transformer model Vector-Matrix Multiplications VMMs

Quantum News

Enhancing Deep Neural Network Accuracy with Analog-Based In-Memory Computing Accelerators

Introduction to Analog-Based In-Memory Computing (AIMC) Accelerators

Post-Training Optimization Methods

Application of PT Optimization Methods

Analog-Based In-Memory Computing (AIMC) Accelerators

Hardware-Aware Training Techniques

The GLUE Benchmark and RoBERTa

Hardware Model for Simulations

Latest Posts by Quantum News:

SEALSQ Corp Reports 66% Revenue Growth in FY2025, Forecasts Q1 2026 Revenue Exceeding $4 Million

Comcast Connectivity Trials Show Quantum Computing Boosts Network Resilience

PsiQuantum Launches Open-Access “Circuit Designer” Tool for Quantum Algorithm Development