The increasing deployment of artificial intelligence on edge devices, such as smartphones and embedded systems, necessitates efficient training methodologies that minimise computational cost and energy consumption. Traditional neural network training relies heavily on backpropagation, an algorithm that, while effective, demands substantial memory and processing power. Researchers are now exploring alternatives, including the Forward-Forward algorithm, which replaces the computationally intensive backward pass with an additional forward pass, potentially reducing memory requirements. Jingxiao Ma from Brown University, Priyadarshini Panda from Yale University, and Sherief Reda, also from Brown University, detail their work in “FF-INT8: Efficient Forward-Forward DNN Training on Edge Devices with INT8 Precision”, presenting a novel approach to training deep neural networks on resource-constrained hardware using INT8 precision, a method of reducing the number of bits used to represent a number, and a layer-by-layer training strategy to improve stability. Their research demonstrates performance gains in speed, energy efficiency, and memory usage on a Jetson Orin Nano board, while maintaining comparable accuracy to existing methods.
Forward-Forward Training (FFT) presents a novel methodology for training deep neural networks, utilising INT8 integer arithmetic to address the computational demands of resource-constrained devices. This approach circumvents the traditional reliance on backpropagation, instead employing a ‘goodness-of-fit’ score to assess a network’s performance through forward passes only. Backpropagation, the standard algorithm for training neural networks, calculates gradients – the rate of change of the error – and propagates them backwards through the network to adjust the weights. FFT eliminates this backward pass, reducing memory requirements and potentially accelerating the training process.
Experimental results consistently demonstrate FFT’s superior accuracy when compared to established INT8 training techniques, including post-training quantisation and quantisation-aware training. Post-training quantisation converts a fully trained model to lower precision, while quantisation-aware training incorporates quantisation into the training process to mitigate accuracy loss. These tests were conducted across multiple datasets, notably ImageNet and CIFAR-10, and diverse model architectures such as ResNet-50, MobileNetV2, and EfficientNet-B0. The method also exhibits increased robustness to quantisation noise – the errors introduced by representing data with limited precision – and reduced sensitivity to hyperparameter tuning, simplifying the overall training procedure.
The impetus for developing FFT stems from the limitations of conventional training methods when deploying models on edge devices. These devices, characterised by limited computational resources and memory, struggle with the memory bottleneck created by storing intermediate activations required for gradient calculation during backpropagation. FFT’s direct assessment of a layer’s output ‘goodness’ streamlines the process and significantly reduces the memory footprint.
At its core, FFT learns representations directly from data without relying on gradients. The algorithm defines a goodness function that quantifies how well a layer’s output represents the input data. Layers then learn to maximise this goodness function through forward passes, effectively training the network to differentiate between correct and incorrect data. This contrasts with traditional methods where the network learns by minimising a loss function, which measures the difference between predicted and actual outputs.
Researchers implemented a dynamic quantisation scheme within FFT to further enhance performance. This scheme adjusts quantisation parameters during training, minimising information loss inherent in the reduced precision. By dynamically adapting to the data, the algorithm maintains accuracy despite the lower precision, achieving a balance between efficiency and performance.
Experiments demonstrate a substantial reduction in both energy consumption and memory usage. Testing on a Jetson Orin Nano board, a common platform for edge computing, revealed a 27.0% reduction in memory footprint and an 8.3% reduction in energy consumption, alongside a decrease in training time. These results highlight the potential for deploying more efficient AI systems on resource-constrained devices.
Validation across a diverse range of datasets and model architectures confirms FFT’s versatility. The method successfully trained various Convolutional Neural Network (CNN) architectures, demonstrating its adaptability to different network structures and data types.
The implementation of FFT on a Jetson Orin Nano board provides compelling evidence of its practical benefits as a realistic testbed for evaluating performance in a resource-constrained environment. This allows for a more accurate assessment of the method’s potential in real-world applications.
Researchers observed that FFT exhibits reduced sensitivity to hyperparameter tuning compared to traditional training methods, simplifying the training process and reducing the effort required to achieve optimal performance. This ease of use is a significant advantage for developers working with limited resources or expertise.
Researchers believe that FFT has the potential to unlock new possibilities for deploying deep learning models on edge devices. By reducing the computational and memory requirements of training, the method enables the development of more efficient and sustainable AI systems.
Future research directions include exploring the theoretical properties of FFT and developing more sophisticated goodness functions, as well as investigating the application of FFT to other machine learning tasks. This ongoing research aims to further refine the method and expand its applicability.
In conclusion, FFT represents a significant advancement in the field of efficient deep learning. By eliminating the need for backpropagation and optimising the goodness function, the method enables the development of high-performance, low-precision neural networks suitable for deployment on resource-constrained devices. The demonstrated reductions in memory usage, energy consumption, and training time, coupled with the method’s robustness and ease of use, make it a promising solution for unlocking the full potential of AI in edge computing applications.
👉 More information
🗞 FF-INT8: Efficient Forward-Forward DNN Training on Edge Devices with INT8 Precision
🧠 DOI: https://doi.org/10.48550/arXiv.2506.22771
