
Technology
1 min read
Oumi’s analysis for the NYT shows Google’s AI Overviews are 90% accurate, but with ~5 trillion queri...
The AMW Read
The article updates the Gemini case study with specific error-rate benchmarks and highlights the structural risk of ungrounded outputs at hyperscaler scale, directly impacting the safety/alignment discourse.
NoveltySignificance
Foundation Models · Player MapSafety / Alignment
Oumi’s analysis for the NYT shows Google’s AI Overviews are 90% accurate, but with ~5 trillion queries a year that still means ~57 million wrong answers each hour (≈100 k per minute). The error rate dropped from 85% (Gemini 2) to 95% (Gemini 3) on the Simple QA benchmark, yet over half of the correct answers are ungrounded. This scale‑level misinformation forces tighter verification layers and could reshape trust in search‑centric AI.

