OpenAI launches three real-time voice models with GPT-5-level reasoning, cuts simultaneous translation to $0.034/min
The AMW Read
Novelty 2: product line extension for a known case-study player, not a new entrant; Significance 3: pricing and reasoning voice capabilities reshape multiple adjacent markets (interpretation, transcription, voice agents) across segments.
OpenAI launches three real-time voice models with GPT-5-level reasoning, cuts simultaneous translation to $0.034/min
OpenAI has released three new real-time voice models — GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper — that bundle end-to-end voice reasoning, simultaneous interpretation, and streaming transcription into a single API. GPT-Realtime-2 is the first OpenAI voice model with GPT-5-level reasoning, supporting 128K context (up from 32K) and parallel tool calling, with five adjustable reasoning levels. GPT-Realtime-Translate performs streaming simultaneous translation across 70+ input languages to 13 output languages, priced at $0.034/minute (~RMB 0.25/min), or roughly $2/hour. GPT-Realtime-Whisper offers streaming speech-to-text at $0.017/minute (~$1/hour). All three models are available immediately in the OpenAI Playground and via API with Codex prompt templates.
Why it matters: This launch extends the hyperscaler distribution moat by pushing reasoning-grade voice AI into API endpoints that any developer can embed, collapsing a high-cost professional service — simultaneous interpretation — to a commodity API call. The 66x cost reduction vs. human interpreters mirrors the capital-compression arc seen in other segments, where frontier models democratize previously exclusive enterprise workflows. GPT-Realtime-2’s parallel tool calling and 128K context also update the context-engineering moat pattern, enabling voice agents to orchestrate multi-step business processes without keyboard input. Zillow’s internal benchmark showing a 26-percentage-point jump in call success rate (69% to 95%) validates that reasoning voice models can handle high-compliance, high-value enterprise scenarios.
Grounded expert take: OpenAI is executing a clear market strategy: bundle GPT-5-level reasoning into the lowest-priced real-time voice API on the market, commoditizing both simultaneous interpretation and traditional speech-to-text while raising the capability floor for voice agents. The pricing — $2/hour for translation, $1/hour for transcription — sets a new benchmark that competitors like Deepgram, AssemblyAI, and ElevenLabs will have to match or undercut. For the interpretation industry, the value of human translators will migrate upward to cultural nuance, legal precision, and creative expression, while standardized translation volume shifts to API consumption. The model’s adjustable reasoning levels (low to xhigh) also create a tiered pricing structure that lets developers trade latency for intelligence, a pattern that could become standard for voice inference.

