/

/

Word Error Rate (WER)

Word Error Rate (WER)

What is Word Error Rate (WER)?

Word Error Rate is the most common metric for measuring ASR accuracy. It's calculated by comparing a system's transcript against a reference and counting substitutions, deletions, and insertions, divided by the total words in the reference. Lower is better - a 5% WER means only 5% of words were transcribed incorrectly.

What is an example of Word Error Rate use?

If the reference is "Please reschedule my appointment" (4 words) and the ASR outputs "Please rescue my appointment," there's one substitution, giving a WER of 25%.

How does Word Error Rate work?

WER = (Substitutions + Deletions + Insertions) / Total Words in Reference. It's a word-level edit-distance metric. For more info read our blog on this topic.

How does ai-coustics use Word Error Rate?

WER is the primary metric we optimize our Quail family against. Rather than tuning for perceptual scores like PESQ or SIGMOS (which matter for human listeners, not machines), we measure how much Quail reduces WER on real-world audio (noisy calls, cheap headsets, low-bitrate codecs, multi-speaker environments). The result is cleaner input for ASR systems and more reliable voice agents downstream.

Final logo

Bring real-time audio intelligence into your voice AI stack

Bring real-time audio intelligence into your voice AI stack