The optimisation of computational performance remains a central challenge in modern computing, particularly as the complexity of deep neural networks continues to increase. Predicting the performance of individual program components, known as GPU kernels, is crucial for efficient resource allocation and code refinement. Researchers now present a novel approach utilising large language models (LLMs) to forecast GPU kernel performance directly from source code, bypassing traditional, time-consuming execution and profiling methods. Zixian Wang from the University of Illinois Urbana-Champaign and Muhammad A. Awad from AMD detail this work in their paper, “Omniwise: Predicting GPU Kernel Performance with LLMs”, demonstrating a lightweight, model-agnostic pipeline capable of achieving over 90% of predictions within a 10% relative error on AMD’s MI250 and MI300X architectures.

The optimisation of applications utilising Graphics Processing Units (GPUs) necessitates innovative methods for performance analysis, and researchers have introduced Omniwise, a system employing large language models (LLMs) to predict GPU kernel performance with notable accuracy. Unlike conventional profiling techniques which require code execution and dedicated tools, Omniwise predicts key performance metrics directly from the kernel code itself.

Omniwise consistently achieves predictions within a 10% relative error margin when tested on MI250 and MI300X GPU architectures. Its design is model-agnostic and lightweight, delivering strong results even with a relatively small 3-billion parameter model. Researchers demonstrate the LLM’s capacity to estimate crucial performance indicators, including memory bandwidth, cache hit rates, Giga Floating Point Operations Per Second (GFLOPs), and arithmetic intensity. GFLOPs represent a measure of a processor’s floating-point computing performance, while cache hit rates indicate how frequently data is found in the faster cache memory, impacting overall speed.

Central to Omniwise’s functionality is the prediction of arithmetic intensity, a metric quantifying the ratio of floating-point operations to memory accesses. Higher values generally indicate improved performance, as the processor spends more time performing calculations and less time retrieving data. By accurately estimating this value, Omniwise provides developers with insights into potential performance bottlenecks within their GPU kernels, enabling targeted optimisation efforts. The system utilises LoRA (Low-Rank Adaptation), a technique for efficiently fine-tuning LLMs, minimising computational overhead and maximising performance.

Researchers have developed an online inference server and a Visual Studio Code plugin to facilitate seamless integration into existing development workflows. These tools allow developers to incorporate LLM-based performance prediction directly into their coding process, identifying and addressing performance issues early in the development cycle.

Future work includes expanding the scope of predicted performance metrics to include power consumption and exploring application to a wider range of GPU architectures. Improving the model’s ability to handle complex kernel code and optimise predictions for specific hardware configurations are key areas for development. Expanding the training dataset to include more diverse and representative GPU kernels will enhance the model’s generalisation capabilities and improve accuracy across a broader spectrum of applications.

Researchers also plan to explore combining Omniwise with existing performance analysis tools, creating a more comprehensive and effective optimisation ecosystem. This integration will leverage the strengths of both LLM-based prediction and traditional profiling techniques, resulting in a more robust and efficient workflow.

Omniwise demonstrates the potential of applying large language models to traditionally non-linguistic domains, opening new avenues for innovation in computer systems and performance engineering. The system’s success highlights the ability of LLMs to learn complex relationships between code characteristics and hardware behaviour, enabling accurate performance prediction without code execution. This capability has significant implications for high-performance computing, machine learning, and data analytics.

The lightweight and model-agnostic design of Omniwise makes it easily adaptable to different GPU architectures and programming models. This flexibility allows developers to seamlessly integrate the system into their existing workflows without significant modifications. The system’s ability to provide accurate performance predictions early in the development cycle can significantly reduce the time and cost associated with performance optimisation.

Researchers envision a future where LLM-based performance prediction becomes an integral part of the software development process. By providing developers with early feedback on performance bottlenecks, Omniwise can help them write more efficient code and optimise their applications for maximum performance, leading to improvements in application performance and energy efficiency.

The development of Omniwise represents a step towards a more intelligent and automated approach to performance optimisation. By leveraging the power of large language models, researchers have created a system that can help developers unlock the full potential of their GPU-accelerated applications.

👉 More information
🗞 Omniwise: Predicting GPU Kernels Performance with LLMs
🧠 DOI: https://doi.org/10.48550/arXiv.2506.20886

Tags:

Arithmetic Intensity cache hit rate Deep Neural Networks GFLOPs GPU kernel prediction Large Language Models Memory Bandwidth MI250 MI300X. performance profiling self-supervised learning

Quantum News

AI Predicts GPU Performance From Code Using Large Language Models.

Latest Posts by Quantum News:

AQT Arithmos Quantum Technologies Launches Real-World Testing Program, Starting March 31, 2026

Rigetti Computing Announces Date for Q4 & Full-Year 2025 Financial Results

Quantonation Closes €220M Fund, Becoming Largest Dedicated Quantum Investment Firm