OpenAI GPT-5.5 tops Agents' Last Exam, beating Anthropic Claude Fable 5
The AMW Read
Updates the competitive landscape in frontier model segment; resolves open debate on agentic capability between OpenAI and Anthropic.
OpenAI GPT-5.5 tops Agents' Last Exam, beating Anthropic Claude Fable 5
OpenAI's GPT-5.5 has achieved the highest scores on the newly released Agents' Last Exam benchmark, surpassing Anthropic's Claude Fable 5. The benchmark focuses on multi-part instruction adherence, testing models on complex, long-horizon reasoning tasks that simulate real agent workflows. This marks a notable shift in the frontier model leaderboard.
This outcome updates the ongoing debate between OpenAI and Anthropic over which approach — OpenAI's emphasis on general-purpose reinforcement learning versus Anthropic's constitutional AI safety-first method — produces superior agentic performance. The win validates OpenAI's continued investment in model scale and training infrastructure, while signaling that agentic capability, not just raw chat competence, is becoming the defining competitive axis.
For investors and enterprise buyers, the result reinforces the value of benchmark-driven procurement for agent workloads. OpenAI's dominance on this metric may accelerate migration from competitors for complex automation use cases. However, single benchmark results should be contextualized within overall model safety and cost profiles.

