How telli powers enterprise Voice AI with LiveKit and ai-coustics

Home

Blog

Home

Blog

How telli powers enterprise Voice AI with LiveKit and ai-coustics

Written by

Nell Campbell

Head of Marketing

Case studies

May 19, 2026

Founded in 2024 by Philipp Baumanns, Seb Hapte-Selassie and Finn zur Mühlen, telli is one of Europe's fastest-rising AI startups. Backed by Y Combinator, they build voice agents that automate high-volume phone operations for B2C companies, qualifying leads, booking appointments and handling customer service.

In under two years, telli scaled from zero to over five million calls processed, serving some of Europe's most demanding enterprises across energy, telecom and beyond. Building on LiveKit, the real-time infrastructure powering voice AI at scale, gave them the foundation to get there. Ensuring their agents worked reliably in production is where ai-coustics came in, natively integrated into LiveKit and purpose-built for the audio conditions that voice agents actually face.

“At our scale, audio quality isn't a detail. It's the difference between converting a customer and losing one. ai-coustics gives us the clarity our agents need to perform at volume.”
Seb Hapte-Selassie, Co-founder, telli

The cost of bad audio

In phone-based customer acquisition, the real bottleneck isn't speech generation. It's making voice work reliably across thousands of calls a day, through noisy environments, telephony compression and the unpredictability of human speech.

At enterprise volume, every edge case becomes a pattern. Short utterances were proving difficult for the STT models, degrading voice activity detection and breaking end-of-speech detection and turn-taking. The issues weren't telli's to solve alone, they were central to how voice AI handles real-world phone audio.

The cost of poor audio isn't abstract. Every misunderstood call risks escalating to a human agent at 5-8x the cost, and at telli's volumes, those losses compound fast. With enterprise clients operating under strict SLAs where every unresolved call is tracked, the question was never whether audio quality mattered, but how to fix it.

“Audio intelligence is the missing layer in voice AI, the gap between an agent that sounds good in a demo and one that works in production. telli recognized this early on, and it's the reason their agents run as reliably as they do."
Fabian Seipel, Co-founder, ai-coustics

Engineering the fix

telli integrated ai-coustics' SDK and Quail Voice Focus model directly into their voice agent pipeline on LiveKit. The two teams worked engineer-to-engineer through a shared Slack channel, running hands-on evaluations on real agent calls across a range of languages.

What they found was telling. The core failures, undertranscription of short utterances and end-of-speech detection errors, weren't problems that traditional denoising models could fix. Most speech enhancement tools are tuned for human listening, making audio sound cleaner to the human ear. Quail is built for machine understanding, optimizing the signal specifically for the phonetic details that STT models, VAD and turn-taking logic need to perform.

From there, telli adopted Quail VAD and the newly launched Quail Voice Focus 2.1, formalizing call quality into a structured, benchmark-driven process. The results spoke for themselves:

Voice Focus 2.1 in action

Example 1

Example 2

Example 3

Raw audio

Allô?

Yeah, I have a few minutes.

What did you say that he took a while?

Yeah, that's correct.

To get him.

Yeah, it's okay.

We can go through your questions now.

Quail Voice Focus 2.1

Allô?

Yeah, I have a few minutes.

Yeah, that's correct.

Yeah, it's okay.

We can

we can go through your questions now.

Going enterprise

telli is reshaping how Europe's largest companies run customer engagement in production. Working with enterprises like Sky, Europe's leading entertainment and telecommunications company, telli went live in under three weeks with ai-coustics brought in to improve audio quality across the deployment. Setups like this only work when the fundamentals are right, and audio is one that most teams underestimate.