Phonely x ai-coustics: Speaker isolation and audio insight for production Voice AI

Home

Blog

Home

Blog

Phonely x ai-coustics: Speaker isolation and audio insight for production Voice AI

Written by

Nell Campbell

Head of Marketing

Case studies

Jun 18, 2026

Founded in San Francisco and backed by Y Combinator, Phonely recently raised a Series A, with three of its own enterprise customers investing in the round. That kind of conviction from customers speaks to what Phonely is building.

While much of the voice AI space is still figuring out call-handling, Phonely is already running structured lead qualification pipelines, multi-vertical inbound flows and production deployments at scale, for the kinds of businesses where every missed call has a real cost. Their agents don't just answer the phone, they qualify, route and convert.

Even the best agent logic can't compensate for bad audio. When a caller answers from a noisy room, or has the television on in the background, the issues start to cascade downstream. That's where ai-coustics comes in.

Fixing audio at the source

Transcription accuracy is the foundation of every voice agent. A WER spike doesn't just mean bad transcriptions. It means failed intent detection, broken flows and calls that escalate or drop. In a structured qualification funnel, one misheard utterance can corrupt the whole session, and ultimately lose the client revenue.

Phonely's call audio reflects the real world, with callers connecting from noisy homes, on mobile connections, across variable codecs, with each problem corrupting downstream performance. Last year they set out to fix this with one clear objective: lower WER on degraded audio without adding latency to the ASR pipeline.

How Phonely chose ai-coustics

Phonely came across ai-coustics while optimizing STT accuracy and evaluating noise-cancellation models, but testing exposed a core problem. Noise-cancellation models are often optimized for human listening, not machine transcription, and audio that sounds fine to a person can still significantly degrade STT accuracy.

To evaluate fairly, they benchmarked all candidates against a ground-truth label set, measuring word error rate consistently across the same conditions. Against both competitors and a no-processing baseline, ai-coustics came out ahead, including a 5% WER reduction versus the no-processing baseline. The decision was straightforward.

From there, ai-coustics moved into production evaluation with one of Phonely's enterprise customers. Phonely first measured performance before enabling ai-coustics, then evaluated the post-processing period in production. That first post-processing window correlated with an improvement in billable outcomes and qualification rate.

Several months later, Phonely upgraded to Quail Voice Focus, tackling the issue of identifying the primary speaker in a chaotic real-world setting. In that production readout, WER fell from an absolute average percentage of 24.9% to 22.7% over hundreds of thousands of calls. The same period correlated with a 3.65% relative increase in calls that led to successful qualifications, and an 8.73% relative increase in calls that led to billable outcomes.

In production with an enterprise customer

For an enterprise customer, Phonely handles first-pass qualification on their behalf, a structured funnel with measurable conversion at each step. The caller base reflects the reality of production voice AI, with a high proportion of callers phoning from home, on mobile connections and with considerable background noise. For this kind of workflow, audio quality directly affects whether the agent can understand, qualify and route the caller correctly.

The results below compare the production period before and after using Quail Voice Focus. Customer identity, sector, precise measurement dates and absolute production funnel values have been redacted for external sharing.

Enterprise customer benchmarks

Metric	Production result
WER	24.9% → 22.7%
Successful qualification impact	+3.65% relative increase
Billable outcome impact	+8.73% relative increase

*Relative increases are calculated by comparing ai-coustics Quail to Quail Voice Focus.

Validated in production

In production, raw WER is not the same as caller understanding. It can be affected by post-transfer speech, short calls and recording or transcript boundary artifacts.

To validate the results, Phonely measured whether ai-coustics increased errors on business-critical caller inputs, such as digits, yes/no confirmations and objection terms. Across a large production analysis, English caller understanding showed no measurable degradation.

The important signal is that the production funnel moved in the right direction while caller understanding stayed stable. These results are observational and reflect all changes shipped during the measured periods, so they should not be read as a same-period A/B test proving ai-coustics alone caused the lift.

What's next: Audio Insight

Last week, ai-coustics launched Tyto. It's a lightweight audio insight model that runs on incoming call audio in real-time, outputting a single Tyto Score between 0 and 1 that indicates how likely the audio is to cause downstream failures.

Alongside the score, Tyto Dimensions break down exactly what's driving it, across different acoustic conditions including noise, reverb, codec compression, dropouts and interfering speech from media devices like TVs. Phonely are now in the early stages of testing.

"Audio that sounds fine to a human can still be over-processed for a machine. Being able to prove that at scale changes everything."
Jeechieu Ta, Senior Backend Engineer, Phonely

The model was built with use cases in mind: post-call analysis to surface audio-caused failures and give customers objective evidence, and mid-call adaptation, using real-time scores to adjust agent behavior dynamically.

Committed to evaluation

Phonely’s approach to rigorous benchmarking and a transparent view of where failure modes live is a big part of why their growth trajectory looks the way it does. In production voice AI, audio quality is not cosmetic. It is infrastructure. When you build on solid foundations, the results follow.

If you're interested in how ai-coustics can improve transcription accuracy and audio observability in your own voice AI stack, get in touch.

Learn more about Tyto

Test for free

Introducing Quail Voice Focus 2.2: Primary speaker isolation in every situation

Previously:

Introducing Tyto: Audio Insight into every call for Voice AI teams, at scale

See all articles