Robust watermarking, a technique for embedding hidden signals within data generated by artificial intelligence, faces fundamental limits that researchers now precisely define. Danilo Francati from Sapienza University of Rome, Yevin Nikhel Goonatilake from George Mason University, Shubham Pawar from Royal Holloway, University of London, and colleagues establish a threshold beyond which watermarks become undetectable, demonstrating that any scheme falters if more than half of the encoded bits are altered. The team introduces a new coding abstraction, termed messageless secret-key codes, to formalise the requirements for robust watermarking, including tamper detection and security. Crucially, they not only identify this limit but also construct efficient codes that achieve maximum robustness under standard cryptographic assumptions, and experimental results confirm that current watermarking techniques already operate at this critical threshold, offering a complete characterisation of the field.
Fragile Watermarks And AI Generated Content
The central challenge in identifying AI-generated content lies in the fragility of existing watermarking techniques, which are often susceptible to removal through relatively simple manipulations or attacks. Watermarking aims to establish the origin of AI-generated content, combating misinformation, copyright infringement, and malicious use, and is increasingly seen as a requirement for compliance with emerging regulations. However, generative models possess remarkable power in recreating and transforming data, making it difficult to embed a signal that survives these processes without being noticeable or easily removed. Current watermarks are often defeated by paraphrasing text, slight image manipulations, or simply regenerating the content, highlighting a fundamental trade-off between watermark strength, detectability, and content quality.
Various watermarking approaches have been explored, each with weaknesses. Hidden signatures, embedding information within the model’s latent space, are vulnerable to manipulations of that space. Paraphrasing-based watermarks, controlling specific phrasing, are easily defeated by rewording the text. Tree-ring watermarks, embedding patterns into the generation process, can be removed through regeneration or image manipulation. Techniques like latent diffusion model watermarks, universal adversarial signatures, optimized watermarks, frequency domain watermarks, and model-based watermarks share common vulnerabilities, including removal through regeneration, paraphrasing, image manipulation, adversarial attacks, and transfer attacks.
Researchers are investigating various attack strategies and potential defenses. Black-box attacks, which don’t require knowledge of the model’s internal workings, are often the most practical. Removal attacks aim to eliminate existing watermarks, while circumvention attacks generate content that avoids the watermark altogether. Potential defenses, such as robust watermark design, embedding watermarks in multiple modalities, developing detection mechanisms, model fingerprinting, retrieval-based defenses, and cryptographic approaches, often have limited effectiveness. Relying solely on watermarking for regulatory compliance may be insufficient, and attributing AI-generated content with certainty can be difficult, suggesting a need for robust detection mechanisms and alternative approaches to content authentication and provenance tracking.
Recent theoretical work establishes fundamental limits to watermarking generative AI, drawing an analogy to watermarks in sand, illustrating that any watermark will inevitably leave a detectable trace or be vulnerable to removal. Researchers argue that there are inherent limits on the amount of information that can be reliably embedded in generated content without affecting its quality or detectability, with significant implications for regulation, attribution, and the need to prioritize detection over reliance on watermarking alone. Future research should focus on developing more robust techniques, exploring alternative approaches to content authentication, and establishing reliable methods for tracking the origin of AI-generated content.
Watermark Robustness Limits in Generative Models
This research introduces a new framework for understanding the robustness of cryptographic watermarking in generative models, establishing precise limits on how much modification watermarked content can withstand before detection fails. Researchers introduced “messageless secret-key codes,” a coding abstraction, to formalize the requirements for robust watermarking: soundness, tamper detection, and pseudorandomness. This allowed them to rigorously define a threshold beyond which watermarks become unreliable. The core finding is that for binary outputs, any watermarking scheme will fail if more than half of the encoded bits are altered, while for an alphabet of size k, the threshold is one in k symbols.
These linear-time codes tolerate up to errors in the binary case and errors in the k-ary case, representing a significant advancement in watermark design. To validate these theoretical findings, researchers experimentally tested the limits of watermarking on recent image generation techniques, focusing on a method developed by Gunn, Zhao, and Song. Subjecting generated images to a simple crop and resize operation reliably flipped approximately half of the latent signs, effectively erasing the watermark and preventing successful codeword recovery, while leaving the image visually intact. This experimental confirmation demonstrates that current watermarking schemes are already operating at the edge of their robustness limits, and further improvements will require fundamentally new approaches.
The team employed a rigorous mathematical framework, defining algorithms as probabilistic polynomial-time and utilizing concepts like Hamming distance to quantify the degree of modification. They also leveraged notation for strings, sets, and randomness to precisely model the behavior of watermarking schemes and attacks. This detailed analysis provides a complete characterization of robust watermarking, identifying the precise threshold of failure, constructions that achieve it, and empirical evidence confirming its practical relevance.
Watermark Robustness Limited by Bit Alteration Rate
Scientists have established a fundamental limit to the robustness of cryptographic watermarking for generative AI models, proving that any watermark will fail if more than half of the encoded bits are altered in binary systems. This breakthrough stems from the introduction of a new abstraction, messageless secret-key codes, which formalizes the essential requirements for robust watermarking: soundness, tamper detection, and pseudorandomness. The research demonstrates that this limit is not merely theoretical, but a concrete barrier to achieving greater robustness with current cryptographic techniques. The team proved that no watermark can reliably survive alterations exceeding half of the encoded bits in binary systems, or (1 − 1/q) of the symbols in q-ary systems.
Conversely, they developed explicit constructions of messageless codes that approach these limits, achieving robustness up to just under half of the bits for binary systems and just under (1 − 1/q) for q-ary systems. These codes, built using secure pseudorandom functions and a public counter, offer efficient, linear-time performance, tolerating errors in up to half of the encoded symbols. Experiments focused on a recent state-of-the-art watermarking scheme for images, revealing that a simple crop-and-resize operation reliably flipped approximately half of the latent signs, effectively erasing the watermark while leaving the image visually intact. This demonstrates that the theoretical limit identified by the research is already being reached in practice, highlighting a striking contrast between text and image modalities, where even benign transformations can erase image watermarks without altering content. The findings position previous impossibility results within a precise, quantitative framework, identifying the exact threshold at which robustness fails and providing constructions that achieve it. This work establishes a definitive characterization of cryptographic watermarking, suggesting that significant increases in robustness will require fundamentally new approaches beyond cryptographic pseudorandomness.
👉 More information
🗞 The Coding Limits of Robust Watermarking for Generative Models
🧠 ArXiv: https://arxiv.org/abs/2509.10577
