/

/

Introducing the new Quail VAD 2.0: Robust speech detection for real-time voice AI

/

/

Introducing the new Quail VAD 2.0: Robust speech detection for real-time voice AI

Introducing the new Quail VAD 2.0: Robust speech detection for real-time voice AI

Introducing the new Quail VAD 2.0: Robust speech detection for real-time voice AI

Tim Janke

Written by

Tim Janke

,

Head of Machine Learning

Case Study

/

Voice Activity Detection (VAD) is a critical part of any real-time voice pipeline. It determines when speech is present, when a turn starts, and when audio should be passed to downstream models.

When VAD is unreliable, the entire experience suffers. Speech can be missed, turns can feel unnatural, and voice agents may respond too early, too late, or not at all.

That is why we developed Quail VAD 2.0: a lightweight, modular VAD model built for robust real-time speech detection in challenging acoustic environments. Unlike the previous VAD functionality in the ai-coustics SDK, which was integrated into our speech enhancement models, Quail VAD 2.0 is purpose-built for speech activity detection and can be used independently or together with the rest of the Quail audio stack.

Built for real-world audio

Production audio is rarely clean. Voice AI systems need to handle background noise, reverberation, echo, compression artifacts, distortion, low-quality microphones, music, far-field speech, and rapidly changing environments.

Imagine someone calling your agent from a train, or a customer ordering at a drive-through kiosk next to a busy road.

Quail VAD 2.0 is designed for these conditions. It provides reliable voice activity detection even when the input signal is noisy or degraded, helping real-time systems stay responsive and accurate in everyday deployment scenarios.

Integrated directly into the ai-coustics SDK

Quail VAD 2.0 runs natively inside the ai-coustics SDK through AirTen, our lightweight inference engine.

Developers can add robust speech detection without introducing a separate runtime, model format, or inference stack. VAD and speech enhancement can run within the same SDK, making deployment simpler while keeping latency and compute overhead low.

For teams building voice agents, live communication tools, or streaming applications, this means fewer moving parts and a more reliable real-time audio pipeline.

Key benefits

Reliable speech detection: Accurately identifies voice activity in noisy, distorted, and reverberant environments.

Low latency and lightweight compute: Designed for real-time applications with minimal processing overhead.

Native SDK integration: Runs directly in the ai-coustics SDK through AirTen, with no additional inference runtime like ONNX required.

Flexible: Use raw probability values to implement custom smoothing and thresholding logic, or rely on the integrated post-processing layer and tune it to your stack.

Modular audio processing: Combine Quail VAD 2.0 with Quail Voice Focus for primary speaker isolation and speech activity detection in multi-talker environments.

Performance in challenging conditions

We evaluated Quail VAD 2.0 against the widely used SileroVAD on an internal benchmark of challenging real-world speech detection scenarios.

In clean and controlled audio, modern VAD systems often perform well. The real difference appears in challenging acoustic conditions, where noise, distortion, and reverberation can lead to missed speech segments.

Quail VAD 2.0 is designed to reduce these failure modes and provide stable speech detection in the kinds of environments where real voice applications operate.

Qualitative examples in different environments

Music

Construction

Trains

Voice 01

Voice 02

Voice 03

Background music

What this means for your voice application

With Quail VAD 2.0, voice systems can detect speech more reliably, route speech segments to ASR more accurately, and support more natural turn-taking behavior.

Because it is integrated directly into the ai-coustics SDK, teams can add robust VAD to their pipeline without increasing deployment complexity.

The result is a simpler, more reliable foundation for real-time voice AI.

Try Quail VAD 2.0 in the ai-coustics SDK

The new Quail VAD 2.0 is now available as part of the ai-coustics SDK.

Book a demo, sign up to our developer platform, or clone the SDK from GitHub to start testing locally.

Final logo

Bring real-time audio intelligence into your voice AI stack

Bring real-time audio intelligence into your voice AI stack

Bring real-time audio intelligence into your voice AI stack