
OpenAI model outperforms doctors in ER diagnoses, achieving 67% accuracy vs 55%
The AMW Read
Updates clinical validation baseline for healthcare AI (segment-level significance); novel edge-case outperformance advances debate on AI-assisted standard of care.
OpenAI model outperforms doctors in ER diagnoses, achieving 67% accuracy vs 55%
A peer-reviewed study compared an unnamed OpenAI model against physicians in simulated emergency room conditions, providing medical profiles and patient histories. The AI achieved a correct or close-to-correct diagnosis in 67% of cases, versus 55% for doctors. The research, published in Science, highlights the potential for AI agents to become commonplace in emergency medicine within a decade.
This milestone updates the clinical validation baseline for healthcare AI, moving beyond narrow imaging tasks into high-stakes triage and differential diagnosis. It reinforces the pattern that frontier models can match or exceed specialist accuracy even under constrained, real-world-like conditions. The result intensifies pressure on health systems to consider AI-assisted workflows as a standard of care, especially as inference costs fall.
A medical AI startup CEO argued that failing to use frontier models for second opinions already constitutes 'malpractice,' a strong version of the safety and adoption debate. While regulatory hurdles and liability questions remain, the evidence gap is narrowing: the study positions OpenAI as a potential standard-setter in clinical decision support, threatening traditional medical AI vendors.


