Google open sourced technology to watermark and detect AI-generated text
Google has made its SynthID Text technology, which allows developers to watermark and detect AI-generated text, widely available. This tool can now be downloaded from the AI platform Hugging Face and Google’s updated Responsible GenAI Toolkit.
Image Credits : Lisa Fotios, Pexels
In a post on X, Google announced, “We’re open-sourcing our SynthID Text watermarking tool,” making it free for developers and businesses to help them identify AI-generated content.
So, how does SynthID Text work? Generative text models predict tokens (words or characters) based on prompts, one at a time, by assigning each a probability score. SynthID Text embeds extra information into these scores by adjusting the likelihood of specific tokens being chosen. The adjusted pattern of token scores acts as the watermark, and this pattern can be compared with non-watermarked text to determine if AI generated it.
According to Google, SynthID Text has been integrated into its Gemini models since earlier this year and doesn’t compromise text quality, accuracy, or speed. It can even detect AI-generated text that has been paraphrased, cropped, or altered. However, the tool does have limitations. It struggles with short text, translations, or factual prompts like “What is the capital of France?” because there’s little room to adjust token distributions without losing accuracy.
Google isn’t alone in this space—OpenAI has also been researching watermarking for AI-generated text but has delayed releasing its tools due to technical and commercial concerns.
If broadly adopted, watermarking tools like SynthID could address the issue of inaccurate “AI detectors” that falsely flag legitimate work. However, widespread adoption remains uncertain, as companies may introduce competing standards. Regulatory pressure may push developers to act, as countries like China and U.S. states like California are considering or implementing mandatory watermarking for AI-generated content.
The urgency around this issue is growing, as an EU Law Enforcement report warns that by 2026, 90% of online content could be AI-generated. This raises new concerns about disinformation, fraud, and other challenges. Already, nearly 60% of all sentences online may be AI-generated, particularly due to the use of AI translation tools, according to an AWS study.