Researchers Reveal 98.26% Success Rate in Audio Attacks on Gemini 2.0 Flash

Researchers are increasingly finding that large audio-language models, designed to operate on raw speech, present novel security vulnerabilities. Ye Yu, Haibo Jin from the University of Illinois Urbana-Champaign, and Yaoning Yu, alongside Jun Zhuang from Boise State University and Haohan Wang from the University of Illinois Urbana-Champaign et al., demonstrate a new ‘audio narrative attack’ which embeds hidden instructions within seemingly harmless audio streams. This research is significant because it reveals how these models can be bypassed using synthetic speech, successfully eliciting restricted outputs from state-of-the-art systems like Gemini 2.0 Flash with a 98.26% success rate, far exceeding the effectiveness of text-based attacks. The findings underscore an urgent need to develop more robust safety frameworks capable of analysing both linguistic content and acoustic characteristics as speech interfaces become commonplace.

Narrative Audio Streams Bypass Safety Protocols in Large Language Models, raising significant concerns

Scientists have demonstrated a novel jailbreak attack against large audio-language models (LALMs) by embedding disallowed directives within narrative-style audio streams. This research examines the security implications of the increasing use of raw speech inputs in systems like voice assistants and clinical triage tools, revealing a distinct class of vulnerabilities previously uncharacterized.
The team achieved a 98.26% success rate in eliciting restricted outputs from state-of-the-art models, including Gemini 2.0 Flash, by leveraging an advanced text-to-speech (TTS) model to exploit structural and acoustic properties of speech. The study unveils how synthetic speech, formatted as a narrative, can circumvent safety mechanisms primarily designed for text-based inputs.

Researchers designed the attack to exploit the way LALMs perceive and respond to persuasive authority and empathy conveyed through speech, effectively overriding alignment safeguards. This approach differs from previous audio jailbreaks that focused on converting text to speech or manipulating audio signals, instead treating voice as a communicative channel capable of influencing model behaviour.
This breakthrough establishes that delivery itself can function as an adversarial mechanism, inducing compliance with unsafe instructions without altering the underlying semantic content. By embedding paralinguistic signals like confidence and emotional tone, the method leverages the personification bias inherent in LALMs.

Experiments show that this delivery-based attack consistently outperforms text-only and signal-level baseline methods across diverse LALMs and benchmarks, with gains of up to 26%. The work opens avenues for developing more robust safety frameworks that jointly reason over linguistic and paralinguistic representations.

As speech-based interfaces become increasingly prevalent, understanding and mitigating these vulnerabilities is crucial. This research highlights the need to consider not only what is said, but also how it is said, when designing secure and reliable audio-language models for real-world applications.

Exploiting Paralinguistic Signals for Covert Prompt Injection in Large Audio Language Models presents a novel attack vector

Scientists investigated the security vulnerabilities of large audio, language models (LALMs) by developing a text-to-audio jailbreak technique. The research team engineered an attack that embeds disallowed directives within a narrative audio stream, exploiting structural and acoustic properties to bypass safety mechanisms.

This approach leverages an advanced text-to-speech (TTS) system to deliver prompts in a manner that elicits restricted outputs from state-of-the-art models, including Gemini 2.0 Flash. Experiments employed a black-box setting, meaning no internal access to the LALM was required, and focused on stylizing speech to evoke interpersonal dynamics.

Researchers designed jailbreaks using therapeutic cadence, performative emphasis, and emotional tone, embedding paralinguistic signals such as confidence and empathy into the audio. The study pioneered a method of leveraging the personification bias of LALMs, inducing compliance with unsafe instructions without altering the underlying textual content of the prompt.

The team evaluated their method on three LALMs, achieving a 98.26% success rate with the audio-based jailbreak, substantially exceeding text-only baselines. This performance was measured by assessing the model’s willingness to generate restricted outputs when presented with the stylized audio prompts.

The system delivers persuasive audio, demonstrating that delivery itself can function as an adversarial mechanism, bypassing alignment safeguards. Further analysis revealed that audio transformation consistently improved attack success rates (ASR) by up to 26% across diverse LALMs and task types. Researchers recorded audio waves using therapeutic and performative strategies, then assessed the impact on model behaviour. This work highlights the need for safety frameworks that jointly reason over linguistic and paralinguistic representations as speech-based interfaces become increasingly prevalent.

Narrative audio streams significantly enhance large language model jailbreaking success rates

Scientists have demonstrated a novel text-to-audio jailbreak capable of circumventing safety mechanisms in state-of-the-art large language models. The research team designed an attack embedding disallowed directives within a narrative audio stream, achieving a 98.26% success rate with Gemini 2.0 Flash.

This performance substantially exceeds results obtained using text-only prompts, highlighting a critical vulnerability in speech-based interfaces. Experiments revealed that leveraging advanced text-to-speech models exploits structural and acoustic properties of speech, effectively bypassing text-centric safety protocols.

The team measured attack success rate (ASR) across multiple benchmarks, consistently observing improvements with stylized speech compared to text and acoustically perturbed baselines. Maximum gains of 26% were recorded, demonstrating the efficacy of the new attack vector. Data shows that the narrative format, when delivered synthetically, elicits restricted outputs from advanced models.

Researchers recorded that the attack operates in a black-box setting, informed by behavioural theories such as the Media Equation and forced compliance. By embedding paralinguistic signals, confidence, empathy, and narrative pacing, the method leverages the personification bias inherent in large audio-language models.

Tests prove that the approach induces compliance without altering the underlying instruction, focusing instead on the delivery itself as the adversarial mechanism. The study evaluated three state-of-the-art LALMs, including both open and closed-source systems, confirming the robustness of the findings across different architectures.

This breakthrough delivers a crucial insight into the vulnerabilities of increasingly prevalent speech-based interfaces. Measurements confirm the need for safety frameworks that jointly reason over linguistic and paralinguistic representations. The work identifies a new attack vector exploiting psychological features of speech, rather than relying on textual semantics or signal perturbations. This research has implications for the development of more robust and secure audio-enabled systems, particularly in sensitive applications like voice assistants, education, and clinical triage.

Vocal delivery bypasses safety protocols in large audio language models, potentially enabling harmful outputs

Scientists have demonstrated a novel method for compromising large audio, language models (LALMs) through a text-to-audio jailbreak. This attack embeds disallowed directives within an audio stream, exploiting the models’ reliance on both linguistic content and paralinguistic cues. The research reveals that synthetic speech, particularly when delivered with a specific narrative style, can elicit restricted outputs from state-of-the-art models like Gemini 2.0 Flash, achieving a high success rate of 98.26%.

The findings establish that delivery-based modulation significantly improves attack success across various models and TTS settings, suggesting the effectiveness stems from how LALMs interpret vocal cues. Researchers translated theories of influence into controllable vocal styles, showing that delivery alone can bias model compliance, surpassing the performance of text and acoustic baselines.

This work underscores that textual alignment alone is insufficient for multimodal safety, necessitating defenses that jointly model linguistic content, prosody, and speaker intent. The authors acknowledge limitations, noting the attack’s reduced effectiveness on smaller LALMs, where audio perturbations can cause decoding instability.

They also highlight the reliance on a limited set of hand-crafted delivery styles and the current study’s focus on English speech. Future research should focus on automating the discovery of adversarial delivery styles and developing alignment mechanisms robust to socially framed or affective speech, as well as expanding evaluations to include multilingual and cross-accent data. This research contributes to a growing understanding of vulnerabilities in LALMs and emphasizes the need for more comprehensive safety frameworks as speech-based interfaces become increasingly integrated into daily life.

👉 More information
🗞 Now You Hear Me: Audio Narrative Attacks Against Large Audio-Language Models
🧠 ArXiv: https://arxiv.org/abs/2601.23255

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently

March 3, 2026

Protected: Quantum Computing Tackles Fluid Dynamics with a New, Flexible Algorithm

March 3, 2026

Protected: Silicon Unlocks Potential for Long-Distance Quantum Communication Networks

March 3, 2026