Energy-efficient Vision Transformer Inference Achieves 53% Energy Reduction for Edge-AI Deployment on Jetson TX2

The increasing demand for artificial intelligence on portable devices presents a significant challenge, as powerful image recognition models often require substantial energy consumption. Researchers, including Nursultan Amanzhol and Jurn-Gyu Park from Nazarbayev University, and their colleagues, now address this issue by developing a comprehensive method to evaluate the energy efficiency of Vision Transformers, a leading type of image recognition technology. Their work moves beyond simply measuring accuracy, instead focusing on identifying models that deliver strong performance with minimal power usage. By benchmarking thirteen different Vision Transformer models on standard image datasets and real-world hardware, including a portable edge device, the team demonstrates that certain hybrid and distilled models, such as LeViT and TinyViT, can reduce energy consumption by over 50% without sacrificing image recognition capabilities, paving the way for more sustainable and practical artificial intelligence applications.

Vision Transformers (ViTs) now achieve state-of-the-art results in many computer vision tasks, but deploying them on devices with limited power presents a significant challenge. This research investigates energy-efficient deployment of ViTs on edge devices, focusing on balancing accuracy and energy consumption. The team developed a two-stage process to identify optimal models for resource-constrained environments, utilising device-independent model selection and direct measurements on target hardware. Results demonstrate that hybrid models, such as LeViT_Conv_192, reduce energy consumption by up to 53% on the TX2 edge device compared to standard ViT models, achieving a high score on a combined accuracy and efficiency metric.

Distilled models, such as TinyViT-11M_Distilled, perform exceptionally well on mobile GPUs, achieving strong results on benchmark tests. This research addresses the growing need to evaluate Vision Transformers not just on accuracy, but also on energy efficiency, especially for deployment on edge devices. The authors argue that traditional metrics are insufficient and propose a two-stage evaluation pipeline, called E3P-ViT. Key findings reveal a gap between theoretical efficiency predictions and real-world energy consumption, and that energy efficiency is heavily dependent on both the hardware platform and the specific task, such as ImageNet-1K and CIFAR-10.

Hybrid architectures like LeViT and distillation techniques like TinyViT-11M_Distilled offer promising avenues for improving energy efficiency. The team benchmarked 13 ViT models on both the ImageNet-1K and CIFAR-10 datasets, running inference on an NVIDIA Jetson TX2 and an RTX 3050 mobile GPU.

The initial stage utilises a metric to screen models, narrowing the field of candidates before deployment. Subsequently, the device-related stage measures time, power, and energy consumption, ranking models using a combined accuracy and efficiency metric. This research presents a comprehensive evaluation of Vision Transformer models, addressing the critical need to assess energy efficiency alongside accuracy. Extensive benchmarking across both edge devices and mobile GPUs reveals that energy efficiency is heavily dependent on the specific hardware and task. The findings demonstrate that composite theoretical metrics alone can be misleading when evaluating these models, and hardware-aware evaluation is essential for sustainable AI deployment. Notably, hybrid models like LeViT_Conv_192 significantly reduce energy consumption on less powerful devices, while distilled models such as TinyViT-11M_Distilled excel on GPUs where accuracy is paramount. This work validates that achieving optimal performance requires careful consideration of both model architecture and the target hardware.

👉 More information
🗞 Energy-Efficient Vision Transformer Inference for Edge-AI Deployment
🧠 ArXiv: https://arxiv.org/abs/2511.23166

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Graphene/inse Heterostructures Exhibit Asymmetric Quantum Hall Effect and Vanishing Longitudinal Resistance at High Magnetic Fields

Graphene/inse Heterostructures Exhibit Asymmetric Quantum Hall Effect and Vanishing Longitudinal Resistance at High Magnetic Fields

December 2, 2025
Video-r2 Enhances Multimodal Reasoning with Reinforcement Learning, Achieving Improved Consistency across 11 Benchmarks

Video-r2 Enhances Multimodal Reasoning with Reinforcement Learning, Achieving Improved Consistency across 11 Benchmarks

December 1, 2025
Algorithmic Quantum Simulations Demonstrate Finite-Temperature Thermodynamic Properties with Quantitative Agreement

Algorithmic Quantum Simulations Demonstrate Finite-Temperature Thermodynamic Properties with Quantitative Agreement

December 1, 2025