OpenAI GPT-5.5 Tops AI Safety Institute Cybersecurity Evaluation, Outpaces Anthropic Mitos in Cybersecurity Evaluation

OpenAI's latest generative AI model, GPT-5.5, has achieved the highest score in a in cybersecurity capability evaluation conducted by the UK government's AI Safety Institute (AISI). According to a report published May 17 (local time) on the AISI official website, GPT-5.5 recorded an average pass rate of 71.4% across 95 expert-level cybersecurity tasks spanning cryptography, web attacks, reverse engineering, exploit development, and vulnerability research. This outperforms OpenAI's predecessor GPT-5.4 (52.4%), as well as Anthropic's Claude Mitos Preview (68.6%) and Claude Opus 4.7 (48.6%). Notably, GPT-5.5 became only the second model to complete 'The Last One,' a 32-step enterprise network penetration simulation designed by AISI to test autonomous AI agent threat capabilities, succeeding on 2 of 10 attempts versus Mitos's 3 successes.

Why it matters: This result updates the frontier cybersecurity capability baseline within the foundation model segment, showcasing a recurring pattern of rapid capability escalation driven by improvements in coding, reasoning, and long-horizon autonomy. The head-to-head comparison between OpenAI and Anthropic directly informs the open debate over which safety-oriented lab is producing the most capable (and potentially dangerous) frontier models. That GPT-5.5 surpasses Mitos on average but lags on the hardest simulation (Mitos completed it 3 times vs. GPT-5.5's 2) underscores a narrowing but not yet conclusive competitive gap.

Grounded take: The AISI evaluation is a controlled laboratory benchmark, not a reflection of deployed product safety, as the institute itself noted. However, the fact that a model can autonomously execute a multi-stage cyberattack that would take a human ~20 hours signals a structural force — the compression of offensive cybersecurity work into AI agent time scales. Frontier labs are inescapably locked in a race where safety evaluation standards themselves become competitive scoreboards. For enterprises this reinforces the need for defensive AI agents and monitor-first deployment strategies, as the underlying model capability cycle shows no sign of plateauing.

OpenAI GPT-5.5 Tops AI Safety Institute Cybersecurity Evaluation, Outpaces Anthropic Mitos in Cybersecurity Evaluation

The AMW Read

#OpenAI #GPT-5.5 #AI-Safety #Cybersecurity

How This Connects

Related News

SoftBank reveals its proprietary AI gateway 'Cloud Proxy' supporting the '1 person, 100 agents' vision

DeepSeek begins developing custom AI inference chips to reduce dual dependency on NVIDIA and Huawei.

DeepSeek begins in-house AI chip development to cut reliance on NVIDIA

Ant Group’s Lingbo Technology releases spatial perception model LingBot-Depth 2.0

Discover AI Startups

OpenAI GPT-5.5 Tops AI Safety Institute Cybersecurity Evaluation, Outpaces Anthropic Mitos in Cybersecurity Evaluation

#OpenAI #GPT-5.5 #AI-Safety #Cybersecurity

Related News

**SoftBank reveals its proprietary AI gateway 'Cloud Proxy' supporting the '1 person, 100 agents' vision**

DeepSeek begins developing custom AI inference chips to reduce dual dependency on NVIDIA and Huawei.

DeepSeek begins in-house AI chip development to cut reliance on NVIDIA

**Ant Group’s Lingbo Technology releases spatial perception model LingBot-Depth 2.0**

Discover AI Startups

SoftBank reveals its proprietary AI gateway 'Cloud Proxy' supporting the '1 person, 100 agents' vision

Ant Group’s Lingbo Technology releases spatial perception model LingBot-Depth 2.0