Get your SDK keys and test for free in the Developer Platform

Introducing the Quail VAD Model: Robust Voice Activity Detection for Real-Time Audio

Traditional voice activity detection (VAD) solutions like Silero VAD often fall short in real-time Voice AI and Voice Agent pipelines. They tend to struggle with sudden and dynamic noise types including background music or a reverberant room. As a result, they typically require extra pre-processing steps, such as denoising with tools like Krisp, to perform reliably. This adds cost, latency and complexity to the pipeline.

Our new Quail VAD model solves this problem.

Performance in real-world environments

Real-world audio presents complex, unpredictable noise – but the ai-coustics VAD excels with this challenge. Across the samples below (‘Construction Site’, ‘Background Music’ and ‘Train Station’), the Silero VAD fails to detect speech reliably, missing segments in the first and completely failing in the other two samples. In contrast, the ai-coustics VAD accurately detects speech.

Built directly into the ai-coustics SDK, it’s designed to work without separate de-noising. Instead it leverages the real-time audio enhancement technology powering Quail to deliver accurate speech detection even in challenging acoustic conditions.

As a result, it offers faster, cleaner, and more responsive performance for voice agents, live conferencing, and streaming applications, without the need for preprocessing.

Supporting natural conversation flow

Voice Activity Detection (VAD) determines whether a segment of audio contains speech or not. It basically helps systems identify when a speaker begins and ends talking – making it easier to manage turn-taking and conversation flow.

Key benefits of the new VAD module:

  • Reliable detection: Accurately identifies speech segments in complex acoustic environments, so as to maintain consistent performance even with low signal-to-noise ratios or background interference.
  • Modular and efficient: Because it’s fully compatible with Quail’s speech enhancement models and integrated directly into the SDK, the VAD adds minimal processing overhead. 
  • Easy deployment: Runs within ai-coustics’ lightweight Rust SDK, with no additional dependencies like Torch or ONNX required.
  • All-in-one: A single SDK to handle detection, enhancement and integration – everything needed for real-time audio processing in one package. 

Additionally, Quail VAD offers a useful blend of customization and easy input, with two tuneable parameters that allow you to adjust the sensitivity and latency to individual use cases.

How does Quail VAD perform?

We tested the new Quail VAD against the classic Silero VAD that most voice agents use.

A graph comparing Silero VAD and ai-coustics Quail VAD performance with Quail winning on both F1 Score and Balanced Accuracy

The ai-coustics VAD demonstrates superior performance across key metrics, including F1 Score and Balanced Accuracy, when evaluated on the MSDWild dataset. This dataset was chosen for its realistic acoustic conditions and high background noise, providing a challenging and representative benchmark for voice agent applications.

What does that mean for your voice agent?

  • Increased ASR (Automatic Speech Recognition) quality: By precisely detecting when speech starts and stops, the Quail VAD model helps ASR systems focus only on the relevant segments. As a result, it reduces false transcriptions and improving overall recognition quality.
  • Improved turn-taking: VAD output often serves as an input to turn-taking models. The more accurate the VAD, the better a system can handle conversational timing and speaker transitions.
  • Lightweight performance: The Quail VAD adds only a minimal processing overhead. It’s designed to run efficiently without noticeably increasing CPU load.

Try the ai-coustics SDK today

If you’re already using Quail for speech enhancement, the new Quail VAD model integrates seamlessly: delivering advanced speech detection with negligible impact on performance.

Ready to experience real-time voice enhancement with integrated VAD? Get in touch for a personalized demo, or sign up to our developer platform to obtain your SDK key. You can then clone or download the SDK code from our GitHub repository to start testing it locally.

Latest updates

Quail STT.

Meet Quail STT: Improving transcription in every condition

Speech-to-Text (STT) or Automatic Speech Recognition (ASR) systems perform well in controlled lab conditions, but real-world audio is anything but controlled. Background noise, reverb, accents and low-quality microphones disrupt the acoustic cues these models depend on. Many teams attempt to fix this with de-noising tools like Krisp, but perceptual enhancement models are built for human ears, not to improve STT/ASR

Read More

Building voice agents of the future

AI-powered audio enhancement improves clarity, accuracy, and emotional understanding in voice communications—empowering voice agents and call centers to deliver faster, more natural, and more consistent customer experiences across any environment or device.

Read More

Ready to embrace the power of Voice AI?

Authentic human voices. Studio-quality sound. Real-time capacity. Automated workflows. It starts here.