Question 1

What is ai-coustics and what problem does it solve for voice AI?

Accepted Answer

ai-coustics is the audio intelligence layer for Voice AI. Founded in Berlin in 2021, we are a team of audio natives building the SDK that enhances, isolates, and balances speech in real time, so voice agents get clean, reliable audio input before it reaches ASR and your LLM.

Question 2

What is Quail Voice Focus? How is it different from noise cancellation?

Accepted Answer

Standard noise cancellation removes ambient noise and Quail Voice Focus is a foreground speaker isolation model. It identifies the primary speaker and suppresses everything else, like background chatter, TVs or competing voices. Quail, on the other hand, is designed for multi-speaker and far-field environments: it enhances all speech in the room rather than isolating one voice. Both are optimized for ASR accuracy, not human listening.

Question 3

Which speech enhancement model should I use for voice agents?

Accepted Answer

It depends on your environment and use case. - Quail Voice Focus 2.0 is best for single-user voice agents (phone bots, voice assistants) where one person speaks near the microphone and background voices need to be suppressed. - Quail - for far-field or multi-speaker scenarios where you want to enhance all speech in the room, not isolate one speaker.

Question 4

Does ai-coustics improve speech-to-text accuracy?

Accepted Answer

Yes. In benchmarks across seven commercial ASR providers (including Deepgram, AssemblyAI, Soniox, Speechmatics, and Gladia) Quail Voice Focus 2.0 reduced Word Error Rate by up to 43% on real-world noisy recordings. It also improves VAD reliability: balanced detection accuracy increases from 79% with Silero to 90% with Quail VAD, which means better turn-taking and less of the latency users experience as an unresponsive agent.

Question 5

Why can I still hear some background noise after processing?

Accepted Answer

For voice agents, the goal is not always to produce the cleanest audio for human ears. Quail models are optimized for speech AI systems, not just perceptual denoising, and that can mean keeping a small amount of ambient context if it helps STT perform better. If your goal is human listening quality, use a human-listening model such as Rook.

Question 6

Can I deploy ai-coustics speech enhancement on-premise?

Accepted Answer

Yes, the ai-coustics SDK runs entirely within your infrastructure. Audio is processed on-device; no audio data is transmitted to ai-coustics servers. The only thing sent back is usage telemetry for billing (minutes consumed).

Question 7

How do I evaluate ai-coustics Voice AI speech enhancement?

Accepted Answer

Word Error Rate is the primary metric for voice AI. Start by capturing audio before and after enhancement, run both through your STT provider, and compare WER on a representative set of real production calls. The Developer Platform and Hugging Face demo let you test on your own audio files before you go to production. For ongoing monitoring, track WER alongside VAD false-trigger rates and turn-taking latency.

Question 8

What programming languages does the ai-coustics SDK support?

Accepted Answer

ai-coustics offers SDK language bindings for Python, Rust, Node.js, C++, C and WASM. Each repository includes detailed integration instructions, examples, and release information.

Question 9

Does ai-coustics work with LiveKit, Pipecat, and custom pipelines?

Accepted Answer

Yes. For LiveKit, there is a native plugin (livekit-plugins-ai-coustics) with no separate license key needed. You authenticate through LiveKit Cloud and billing flows through LiveKit or you can use it in self-hosted with ai-coustics SDK key. For Pipecat, the AICFilter class drops into your existing pipeline with a pip install. For custom stacks, the SDK offers many language wrappers and example code.

Question 10

How does ai-coustics handle different languages or accents?

Accepted Answer

All ai-coustics models are language-agnostic. Our models are trained on data in over 65 languages. Enhancement and speaker isolation work the same way whether your users are speaking English, German, Arabic, or anything else.

Real-time speech enhancement for Voice AI

Real-time speech enhancement for Voice AI

Lower WER in real time

Reliable turn detection

Fewer background insertions

Quail

ASR Primer

Quail Voice Focus

Primary Speaker Isolation

One SDK. Integrated in minutes.

Built for your stack

Try for free now in our Developer Platform

Try for free now in our Developer Platform

Try for free now in our Developer Platform

Your questions, answered

Bring real-time audio intelligence into your voice AI stack

Bring real-time audio intelligence into your voice AI stack

Bring real-time audio intelligence into your voice AI stack

Quail

Sparrow