
Krisp launches VIVA 2.0 voice AI infrastructure layer with real-time audio pre-processing
The AMW Read
Incremental update to a known infrastructure player in the voice AI segment; novel turn- and interrupt-prediction features but within an established product trajectory, with sub-segment significance for voice-agent robustness.
Krisp launches VIVA 2.0 voice AI infrastructure layer with real-time audio pre-processing
Krisp has officially released VIVA 2.0, an updated voice AI infrastructure layer designed to improve interaction quality for voice agents in noisy or complex acoustic environments. The update introduces a new generation of real-time models that process audio signals before automatic speech recognition (ASR), reducing error rates and improving conversational fluidity. Key components include Turn Prediction v3, which uses audio alone to predict conversation-turn endings to prevent unintended interruptions, and Interrupt Prediction v1, a first-of-its-kind module that distinguishes between intentional user interjections and background acknowledgments. VIVA 2.0 also adds text-to-speech detection, accent detection, and gender detection, enabling the AI to recognize synthetic speech and adapt to diverse speaker characteristics. Krisp reports that its VIVA SDK now processes over 12 billion minutes of annual traffic and is integrated into more than 130 products, including Daily and Vapi. Telnyx CEO David Casem noted the approach improves signal quality at the source.
Why it matters: Krisp's VIVA 2.0 targets a structural bottleneck in voice-agent deployment — real-world acoustic robustness — that has limited enterprise adoption of conversational AI in customer service, call centers, and voice-enabled applications. By processing audio before the ASR stage, Krisp effectively pre-conditions audio signals to reduce downstream errors, a strategy that mirrors the "signal-processing moat" seen in earlier voice-interface winners. The scale of its deployment (12 billion minutes annually, embedded in 130+ products) suggests Krisp is building a defensible infrastructure layer underneath the voice-agent stack, analogous to how Twilio became the messaging backbone. This positions Krisp as a horizontal enabler rather than a vertically integrated voice agent provider, allowing it to capture value across multiple agent platforms.
For enterprise buyers evaluating voice AI providers, latency and accuracy in noisy environments remain the highest-friction points. Krisp's real-time turn- and interrupt-prediction models could meaningfully reduce the awkward "talking over" failures that degrade user trust in voice agents. The accent and gender detection features also signal an attempt to handle demographic diversity in speech, a long-standing weakness of many speech recognition pipelines. While Krisp faces competition from broader ASR providers and full-stack voice agent platforms, its SDK-centric approach and existing integration footprint give it a distribution advantage. The company has a credible path to becoming the default audio pre-processing layer for the voice-AI ecosystem.
#VoiceAI #AIInfrastructure #Krisp #ConversationalAI #AgentInfrastructure #SpeechRecognition