Skip to main content
Back to News
OpenAI GPT-5.5 Tops AI Safety Institute Cybersecurity Evaluation, Outpaces Anthropic Mitos in Cybersecurity Evaluation
Technology
2 min read
US

OpenAI GPT-5.5 Tops AI Safety Institute Cybersecurity Evaluation, Outpaces Anthropic Mitos in Cybersecurity Evaluation

The AMW Read

Updates the frontier cybersecurity capability baseline between two top-tier labs; the head-to-head GPT-5.5 vs. Mitos outcome directly informs the open safety-capability debate (Β§7.2).
NoveltySignificance
Foundation Models Β· Player MapFoundation Models Β· Open Debates

OpenAI GPT-5.5 Tops AI Safety Institute Cybersecurity Evaluation, Outpaces Anthropic Mitos in Cybersecurity Evaluation

OpenAI's latest generative AI model, GPT-5.5, has achieved the highest score in a in cybersecurity capability evaluation conducted by the UK government's AI Safety Institute (AISI). According to a report published May 17 (local time) on the AISI official website, GPT-5.5 recorded an average pass rate of 71.4% across 95 expert-level cybersecurity tasks spanning cryptography, web attacks, reverse engineering, exploit development, and vulnerability research. This outperforms OpenAI's predecessor GPT-5.4 (52.4%), as well as Anthropic's Claude Mitos Preview (68.6%) and Claude Opus 4.7 (48.6%). Notably, GPT-5.5 became only the second model to complete 'The Last One,' a 32-step enterprise network penetration simulation designed by AISI to test autonomous AI agent threat capabilities, succeeding on 2 of 10 attempts versus Mitos's 3 successes.

Why it matters: This result updates the frontier cybersecurity capability baseline within the foundation model segment, showcasing a recurring pattern of rapid capability escalation driven by improvements in coding, reasoning, and long-horizon autonomy. The head-to-head comparison between OpenAI and Anthropic directly informs the open debate over which safety-oriented lab is producing the most capable (and potentially dangerous) frontier models. That GPT-5.5 surpasses Mitos on average but lags on the hardest simulation (Mitos completed it 3 times vs. GPT-5.5's 2) underscores a narrowing but not yet conclusive competitive gap.

Grounded take: The AISI evaluation is a controlled laboratory benchmark, not a reflection of deployed product safety, as the institute itself noted. However, the fact that a model can autonomously execute a multi-stage cyberattack that would take a human ~20 hours signals a structural force β€” the compression of offensive cybersecurity work into AI agent time scales. Frontier labs are inescapably locked in a race where safety evaluation standards themselves become competitive scoreboards. For enterprises this reinforces the need for defensive AI agents and monitor-first deployment strategies, as the underlying model capability cycle shows no sign of plateauing.

#OpenAI #GPT-5.5 #AI-Safety #Cybersecurity

#OpenAI#GPT-5.5#Anthropic#Claude Mitos#AI Safety Institute#cybersecurity#autonomous agent threat
Read Original

How This Connects

Based on Foundation Models Β· Player Map

  1. 18h agoMoonshot AI and Stepfun Secure Over 30 Billion Yuan (~$4.2B) in Combined Funding in MayMoonshot AI
  2. 1d agoDeepSeek permanently reduces V4-Pro API price to promotional level, with JD.com, NetEase, and CATL a...DeepSeek
  3. 1d agoAnthropic nears US$30 billion funding round, surpassing OpenAI as most valuable AI startupAnthropic
  4. 1w agoOpenAI GPT-5.5 Tops AI Safety Institute Cybersecurity Evaluation, Outpaces Anthropic Mitos in Cybersecurity Evaluation Β· THIS ARTICLE
  5. 1w agoDeepSeek seeking $7.4B round at $52B valuation as founder commits $2.9B personallyDeepSeek
  6. 2w agoOpenAI releases GPT-5.5 Instant as new default ChatGPT model, cutting hallucinations by over 50%OpenAI

Related News

Discover AI Startups

Explore 2,000+ AI companies with VC-grade analysis, funding data, and investment insights.

Explore Dashboard