Automated program repair, the process of automatically fixing errors in computer code, has seen significant advances thanks to large language models, but these powerful tools demand substantial computing resources. Researchers at Kyushu University, led by Kazuki Kusama, Honglin Shu, and Masanari Kondo, now demonstrate that smaller language models can achieve comparable, and sometimes even superior, bug-fixing accuracy. Their work challenges the assumption that size is paramount for success in this field, revealing that carefully designed small language models offer a viable alternative to their larger counterparts. Importantly, the team also shows that reducing the precision of these models through a technique called int8 quantization dramatically lowers memory requirements with minimal impact on repair performance, paving the way for more accessible and efficient automated program repair tools.
Small Models Rival Large Models for Repair
Small Language Models (SLMs) are becoming increasingly attractive for Automated Program Repair (APR) due to their lower computational demands and reduced need for training data compared to Large Language Models (LLMs). This research investigates whether SLMs can achieve competitive performance in APR, offering a practical alternative to more resource-intensive LLMs. Experiments using the QuixBugs benchmark directly compared the bug-fixing accuracy of SLMs and LLMs under identical conditions. The results demonstrate that state-of-the-art SLMs can fix bugs as accurately as, or even more accurately than, their LLM counterparts, while int8 quantization exhibits minimal effect on APR accuracy but significantly reduces memory requirements. These findings suggest that SLMs present a viable alternative to LLMs for APR, offering competitive accuracy with lower computational costs, and that quantization can further enhance their efficiency without compromising performance.
Quantized Small Language Models for Software Engineering
This research investigates the potential of Small Language Models (SLMs), specifically quantized versions of models like Phi-3, Llama 3, and Qwen2. 5-coder, as viable alternatives to larger models for software engineering tasks. The core argument is that SLMs, when properly quantized, can achieve comparable performance to larger models while significantly reducing computational costs and environmental impact. Key findings include that SLMs with 7 billion parameters can achieve performance comparable to much larger models (with 70 billion+ parameters) on various software engineering benchmarks. Quantization, reducing the precision of model weights, is essential for making SLMs practical, with techniques like GPTQ and 4-bit quantization effectively reducing model size and inference costs without significant performance degradation. SLMs offer substantial benefits in terms of memory footprint, inference speed, and energy consumption, making them more accessible for resource-constrained environments and enabling deployment on edge devices. The research focuses on software engineering tasks, including code completion, bug fixing, code generation, and issue resolution, using benchmarks like SWE-bench to evaluate performance.
Small Models Rival Large Models Bug Fixing
Researchers have demonstrated that small language models (SLMs) can achieve bug-fixing accuracy comparable to, and in some cases exceeding, that of much larger language models (LLMs) in automated program repair (APR). This breakthrough addresses a significant limitation of LLM-based APR, which traditionally demands substantial computational resources. The team conducted experiments using the QuixBugs benchmark, evaluating 14 SLMs and comparing their performance against two LLMs, revealing that the best-performing SLM, Phi-3 (3. 8 billion parameters), successfully fixed 38 out of 40 bugs, a result closely aligned with the 39 out of 40 bugs fixed by the top-performing LLM, Codex. This research highlights a viable alternative to computationally expensive LLMs for APR tasks, offering competitive accuracy with significantly reduced resource requirements.
Furthermore, the team investigated the impact of quantization and found that employing int8 quantization had a minimal effect on repair accuracy, with differences of approximately +0. 5 bugs compared to using the maximum precision float32 representation. This means developers can further reduce memory usage and accelerate inference without substantially compromising the effectiveness of bug fixing. The findings demonstrate that code-specific SLMs, combined with int8 quantization, can deliver performance comparable to LLM-based APR methods while requiring fewer computational resources, paving the way for practical deployment in everyday software development workflows. This work represents the first comprehensive evaluation of 14 SLMs in APR, providing valuable insights into their capabilities and limitations.
Small Models Rival Large for Program Repair
This research demonstrates that small language models (SLMs) offer a compelling alternative to large language models (LLMs) for automated program repair. Experiments on standard benchmarks reveal that the most effective SLMs achieve bug-fixing accuracy comparable to, and in some cases exceeding, that of larger models. This finding is significant because SLMs require substantially fewer computational resources, making them practical for use in everyday development environments. Furthermore, the study confirms that applying int8 quantization, a technique for reducing model size, has minimal impact on repair accuracy while significantly decreasing memory requirements. This suggests that SLMs can be made even more efficient without sacrificing effectiveness.
👉 More information
🗞 How Small is Enough? Empirical Evidence of Quantized Small Language Models for Automated Program Repair
🧠ArXiv: https://arxiv.org/abs/2508.16499
