Researchers have uncovered a novel vulnerability in Retrieval-Augmented Generation (RAG) systems used for code generation, demonstrating how malicious actors can subtly manipulate energy consumption. Yanlin Wang, Jiadong Wu, and Tianyue Jiang from Sun Yat-sen University, alongside Mingwei Liu, JiaChi Chen, and Chong Wang from Nanyang Technological University, detail this threat in their work on DrainCode , the first adversarial attack specifically designed to target the computational efficiency of these systems. DrainCode cleverly poisons contextual information, forcing large language models to generate excessively long outputs, which dramatically increases GPU latency and energy usage , up to 85% and 49% respectively in their experiments. This research is significant because it moves beyond traditional security concerns like code correctness, highlighting a new attack vector focused on resource exhaustion and demonstrating a critical need to evaluate LLM security, particularly in environments with limited computational resources.

This breakthrough research introduces the first attack specifically designed to increase latency and energy consumption without compromising the functional correctness of the generated code. The team achieved this by strategically poisoning the retrieval contexts used by LLMs through a mutation-based approach, subtly influencing the model to produce significantly longer outputs. Experiments demonstrate that DrainCode can increase latency by up to 85%, energy consumption by 49%, and output length by more than 3x compared to baseline performance, a substantial increase in computational overhead.

The core innovation lies in DrainCode’s ability to manipulate LLMs into verbosity without introducing errors, converting retrieval poisoning into a stealthy denial-of-service channel. Researchers constructed a hypothetical query mechanism to enable query-agnostic poisoning, eliminating the need for predefining targeted malicious queries and manually crafting responses. This approach addresses limitations in existing RAG attacks, which often require precise knowledge of victim query distributions and struggle with practicality in real-world scenarios. Furthermore, the study establishes a dual loss function, an EOS loss to suppress early generation termination and a KL-divergence constraint to preserve output distribution, ensuring functional correctness alongside increased verbosity.
To enhance efficiency, the team implemented multi-position mutation and an attack buffer pool, accelerating the poisoning process by over 3x compared to prior work. Evaluation pitted DrainCode against existing energy consumption attacks, including LLMEffiChecker and Prompt-Injection methods, revealing its superior performance. Results show DrainCode induces 25, 32% more overhead than LLM-EffiChecker and maintains 95, 99% functional accuracy in generated code. This demonstrates a significant advancement in the ability to subtly degrade LLM performance without triggering immediate detection. This research highlights the vulnerability of RAG-based code generation systems to non-functional security threats and provides a valuable method for evaluating LLM security in resource-constrained environments. The team’s work opens new avenues for developing robust defenses against computationally expensive attacks and underscores the importance of considering energy consumption as a critical security dimension in LLM deployments. Code and data are publicly available at https://github. com/DeepSoftwareAnalytics/DrainCode, facilitating further research and development in this crucial area.

DrainCode increases LLM latency and energy use significantly

Scientists have unveiled DrainCode, a novel adversarial attack targeting the computational efficiency of Retrieval-Augmented Generation (RAG)-based code systems. The research introduces a method for increasing the computational overhead of Large language models (LLMs), with implications for security evaluations in resource-constrained environments. Experiments demonstrate that DrainCode achieves up to an 85% increase in latency and a 49% increase in energy consumption compared to baseline performance. Furthermore, the team measured more than a 3x increase in output length, indicating a substantial amplification of computational demands.

The core of DrainCode lies in strategically poisoning contexts through a mutation-based approach, forcing LLMs to generate significantly longer outputs, a key factor in escalating GPU latency and energy usage. Researchers constructed a hypothetical query mechanism to generate plausible queries based on retrieved snippets, enabling query-agnostic poisoning and circumventing the need for precise knowledge of victim query distributions. Tests confirm that this approach successfully induces covert resource exhaustion in RAG-based code generation systems, degrading throughput without compromising program behaviour. Measurements reveal that DrainCode outperforms existing energy consumption attacks, such as LLM-EffiChecker, by inducing 25, 32% more overhead.

The team achieved up to 3.5x faster poisoning compared to prior methods, thanks to the introduction of multi-position mutation and an attack buffer pool, which significantly reduced search complexity and accelerated convergence. A crucial aspect of the work is the preservation of functional accuracy; DrainCode maintains 95, 99% functional accuracy while amplifying computational costs, making it a stealthier and more practical attack vector. Scientists employed dual loss functions, an EOS loss to reduce early generation termination and a KL-divergence constraint to preserve output distribution, ensuring the model generates correct code with increased verbosity. Data shows that DrainCode consistently increased output length by a factor of 3x to 10x across various models and prompting strategies. This breakthrough delivers a valuable tool for evaluating LLM security, particularly in scenarios where computational resources are limited and denial-of-service attacks pose a significant threat. The team has made code and data publicly available at https://github. com/DeepSoftwareAnalytics/DrainCode.

👉 More information
🗞 DRAINCODE: Stealthy Energy Consumption Attacks on Retrieval-Augmented Code Generation via Context Poisoning
🧠 ArXiv: https://arxiv.org/abs/2601.20615

Tags:

Adversarial Attacks computational efficiency DrainCode Energy Consumption GPU latency Large Language Models Mutation-Based Approach RAG-based code systems! Retrieval-Augmented Generation

Draincode Achieves 85% Latency Increase Via RAG Context Poisoning Attacks

DrainCode increases LLM latency and energy use significantly

Rohail T.

Latest Posts by Rohail T.:

Lasers Unlock New Tools for Molecular Sensing

Light’s Polarisation Fully Controlled on a Single Chip

New Quantum Algorithms Deliver Speed-Ups Without Sacrificing Predictability