The escalating energy demands of artificial intelligence, particularly in image generation, present a growing environmental challenge, and researchers are now developing methods to accurately predict and mitigate this impact. Aniketh Iyengar, Jiaqi Han, and Boris Ruf, along with colleagues at AXA AI Research and Stefano Ermon, have established a set of scaling laws to quantify the energy consumption of diffusion models. This work moves beyond simply optimising model architecture or hardware, instead focusing on a principled way to predict energy use based on computational complexity. By analysing four state-of-the-art diffusion models across multiple GPU architectures and inference settings, the team demonstrates a highly accurate method for estimating energy consumption, even for previously unseen model and hardware combinations, paving the way for more sustainable AI development and carbon footprint assessment.
This work proposes an adaptation of established scaling laws to predict GPU energy consumption for diffusion models, based on computational complexity measured in floating point operations, or FLOPs. The approach decomposes diffusion model inference into text encoding, iterative denoising, and decoding components, hypothesising that denoising operations dominate energy consumption due to their repeated execution across multiple inference steps. Comprehensive experiments are conducted across four state-of-the-art diffusion models, namely Stable Diffusion 2, Stable Diffusion 3. 5, Flux, and Qwen, on three different hardware platforms.
Diffusion Model Energy Prediction via Scaling Laws
Scientists developed a novel methodology to predict energy consumption in diffusion models, addressing a critical gap in sustainable artificial intelligence deployment. The study pioneers an adaptation of scaling laws, traditionally used for language models, to accurately estimate GPU energy usage during image generation. Researchers decomposed the diffusion model inference process into three key components, text encoding, iterative denoising, and decoding, and hypothesized that the repeated denoising operations constitute the dominant energy consumer. To validate this approach, the team conducted comprehensive experiments across four state-of-the-art diffusion models, Stable Diffusion 2, Stable Diffusion 3.
5, Flux, and Qwen, utilizing three distinct GPU architectures, NVIDIA A100, A4000, and A6000. Experiments spanned a wide range of inference configurations, systematically varying resolution from 256×256 to 1024×1024 pixels, precision levels of fp16 and fp32, step counts from 10 to 50, and classifier-free guidance settings. This rigorous experimental design enabled the team to capture the complex interplay between model architecture, hardware capabilities, and inference parameters. The resulting energy scaling law achieves high predictive accuracy within individual GPU architectures, demonstrating an R-squared value exceeding 0.
- Crucially, the methodology exhibits strong cross-architecture generalization, maintaining high rank correlations across different models and hardware combinations. This allows for reliable energy estimation even for unseen model-hardware pairings, a significant advancement for practical deployment planning. The study confirms that diffusion inference is fundamentally compute-bound, providing a foundation for optimizing energy efficiency and minimizing the carbon footprint of image generation technologies.
Diffusion Model Energy Scaling Law Discovered
Scientists have developed a new scaling law to accurately predict the energy consumption of diffusion models, a type of artificial intelligence used for image generation. The research addresses a critical gap in understanding the energy demands of these increasingly popular models and provides a foundation for more sustainable AI deployment. The team adapted established scaling laws, originally used for language models, to specifically model the energy usage of diffusion models during image generation. Experiments involved four state-of-the-art diffusion models, Stable Diffusion 2, Stable Diffusion 3.
5, Flux, and Qwen, tested across three GPU architectures, NVIDIA A100, A4000, and A6000. Researchers systematically varied inference configurations, including image resolution from 256×256 to 1024×1024, precision levels of fp16 and fp32, step counts ranging from 10 to 50, and the use of classifier-free guidance. The resulting energy scaling law demonstrates high predictive accuracy within each GPU architecture, achieving an R-squared value exceeding 0. 9. Crucially, the model exhibits strong generalization capabilities, maintaining high rank correlations across different diffusion models and GPU architectures.
This allows for reliable energy estimation even for unseen combinations of models and hardware. Data confirms that iterative denoising is the dominant factor in energy consumption, as each step requires a full forward pass through the neural network. The team’s work validates the compute-bound nature of diffusion inference and provides a practical framework for planning sustainable AI deployments and accurately estimating carbon footprints.
Diffusion Model Energy Scaling Laws Demonstrated
This work presents a new framework for predicting the energy consumption of diffusion models, a type of artificial intelligence used for image generation. Researchers developed scaling laws, extending principles previously used to predict computational demands, to accurately estimate energy use based on factors like model complexity, hardware, and image resolution. Comprehensive experiments across four state-of-the-art diffusion models and three GPU architectures demonstrate the framework’s high predictive accuracy, achieving R-squared values exceeding 0. 9 within individual hardware setups. Notably, the framework successfully generalizes across different hardware, enabling reliable energy estimation even for previously unseen model-hardware combinations.
Results reveal that diffusion model energy consumption can vary significantly, spanning three orders of magnitude, with a single high-quality image requiring up to ten times the energy of a typical large language model query. This highlights the critical need for energy-conscious strategies in deploying these models. The research confirms that diffusion inference is primarily limited by computational demands and establishes a foundation for sustainable AI practices, including deployment planning and carbon footprint accounting. The authors acknowledge that their analysis focuses on the inference stage of image generation and does not fully encompass the energy demands of a complete production pipeline. Future work will extend this analysis to video diffusion models and explore real-time carbon-aware optimization techniques, alongside validation on a wider range of hardware, including emerging AI accelerators. This research provides both theoretical insights and practical tools to guide environmental optimization in machine learning and promote more sustainable AI development.
👉 More information
🗞 Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation
🧠 ArXiv: https://arxiv.org/abs/2511.17031
