Microsoft Paza: Public Benchmark for Speech Recognition in 39 African Languages
- Launched the first dedicated ASR leaderboard for low-resource languages
- Performance comparison of 52 latest models available
- Also released 3 fine-tuned models for 6 Kenyan languages
What happened?
Microsoft Research has released Paza, a speech recognition (ASR) benchmark platform for low-resource languages.[Microsoft Research] Paza comes from the Swahili word meaning ‘raise your voice’. This project consists of two parts: the PazaBench leaderboard and the Paza ASR model.
PazaBench is the first ASR leaderboard dedicated to low-resource languages. It measures the performance of 52 state-of-the-art ASR and language models for 39 African languages.[Microsoft Research] It tracks three metrics: Character Error Rate (CER), Word Error Rate (WER), and Real-Time Factor (RTFx).
Why is it important?
Currently, most speech recognition systems are optimized for major languages such as English and Chinese. Although there are over 1 billion African language users, technical support for them has been lacking. Microsoft’s Project Gecko research also revealed that “speech systems fail in real low-resource environments.”[Microsoft Research]
The Paza team emphasized that “creating useful speech models in low-resource environments is not just a data problem, but also a design and evaluation problem.” The key is to not simply add languages, but to create technology together with local communities.
What happens next?
Paza has released three fine-tuned models for six Kenyan languages (Swahili, Dholuo, Kalenjin, Kikuyu, Maasai, and Somali). These are Paza-Phi-4-Multimodal-Instruct, Paza-MMS-1B-All, and Paza-Whisper-Large-v3-Turbo. It is expected to expand to more African languages in the future. It is released in the form of an open benchmark, allowing researchers to freely test and improve models.
Frequently Asked Questions (FAQ)
Q: Which languages does the Paza benchmark support?
A: It currently supports 39 African languages, including Swahili, Yoruba, and Hausa, and also provides fine-tuned models for 6 Kenyan languages. It is operated in the form of a leaderboard, allowing researchers to directly compare model performance.
Q: What performance metrics does PazaBench measure?
A: It measures three metrics. Character Error Rate (CER) measures errors in individual characters, and Word Error Rate (WER) measures errors in words. RTFx represents real-time processing speed and is used to predict response speed during actual deployment.
Q: Why is speech recognition difficult for low-resource languages?
A: There is an absolute lack of training data. While English has tens of thousands of hours of speech data, African languages often have only hundreds of hours. In addition, evaluation itself is difficult because there is a large diversity of dialects and some languages lack standard notation.
If you found this article useful, please subscribe to AI Digester.
References
- Paza: Introducing automatic speech recognition benchmarks and models for low resource languages – Microsoft Research (2026-02-04)
- Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters – arXiv (2023-05-22)
- Whisper: Robust Speech Recognition via Large-Scale Weak Supervision – OpenAI (2022-09-21)