Transcription accuracy is the foundation of everything PodSearch.io does. If the transcript is wrong, search results are wrong. We take accuracy seriously, and we regularly benchmark our pipeline against industry standards. Here are our latest results.
Measuring Accuracy: Word Error Rate (WER)
The standard metric for speech recognition accuracy is Word Error Rate (WER). It measures the percentage of words that are incorrectly transcribed — including insertions, deletions, and substitutions. Lower is better. Human transcriptionists typically achieve 4-5% WER on clean audio.
WER Formula: WER = (Substitutions + Insertions + Deletions) / Total Words in Reference × 100%
Our Results
We benchmarked our pipeline against a curated set of 500 podcast episodes across different genres, languages, and audio conditions. Here are the results:
By Audio Condition
By Language
| Language | Episodes Tested | Avg WER | Rating |
|---|---|---|---|
| English | 300 | 5.8% | Excellent |
| Spanish | 50 | 7.2% | Very Good |
| French | 40 | 7.8% | Very Good |
| German | 35 | 8.1% | Good |
| Portuguese | 30 | 8.5% | Good |
| Japanese | 25 | 11.3% | Moderate |
| Mandarin | 20 | 12.1% | Moderate |
What Affects Accuracy
Through our benchmarking, we've identified the key factors that influence transcription quality:
- Audio quality is the single biggest factor. Studio-recorded episodes with clear audio consistently achieve WER under 5%.
- Background noise and music significantly degrade accuracy, especially music beds played under speech.
- Speaker overlap (cross-talk) causes the most errors in multi-speaker episodes.
- Domain-specific vocabulary (medical, legal, scientific terms) can increase errors if the terms are uncommon in training data.
- Accents and dialects have a moderate impact. Whisper handles most English accents well, but very strong accents can increase WER by 2-3%.
Our Approach to Improving Accuracy
We're not standing still. Our roadmap includes upgrading to larger Whisper models for priority content, building a feedback loop where users can flag transcription errors, and exploring fine-tuning on podcast-specific data. We also plan to add speaker diarization to better handle multi-speaker episodes.
Accuracy matters because every word counts when you're searching for a specific quote or topic. We're committed to continuous improvement.