Slopalspeech: 2,806-Hour Slovak Speech Corpus Reduces Word Error Rate by 70% for Low-Resource ASR

Automatic Speech Recognition (ASR) systems struggle with languages where training data is limited, and Slovak presents a significant challenge in this regard. To overcome this obstacle, Erik Božík from VÚB Banka and Marek Šuppa from Comenius University in Bratislava, along with colleagues, created SloPalSpeech, a substantial new dataset comprising 2,806 hours of Slovak speech harvested from parliamentary proceedings. The team developed a sophisticated method to process these lengthy recordings, producing clean, aligned audio segments paired with accurate transcripts, and then used this resource to dramatically improve the performance of OpenAI’s Whisper models. Fine-tuning these models with SloPalSpeech achieves substantial reductions in Word Error Rate on established Slovak benchmarks, with the smallest model experiencing a 70% improvement, bringing its accuracy close to that of much larger systems, and the complete dataset, transcripts and models are now publicly available to accelerate future research in low-resource speech recognition.

The team developed a robust processing pipeline to transform these extensive recordings into precisely aligned, 30-second audio-transcript pairs optimised for training advanced speech recognition models. This meticulous process involved generating reference transcripts using a combination of Whisper’s transcription capabilities and external alignment models, then aligning these with existing transcripts to establish accurate timestamps. The team constructed a robust processing pipeline to transform these long-form recordings into precisely aligned, 30-second audio-transcript pairs suitable for training advanced speech recognition models. This process involved generating reference transcripts using the WhisperX framework, which combines Whisper’s transcription capabilities with external alignment models, and then aligning these with existing transcripts to establish accurate timestamps. The core of the method relies on identifying “anchors”, words present in both generated and existing transcripts, to create a reliable alignment. Experiments demonstrate significant improvements in speech recognition performance when fine-tuning OpenAI Whisper models with this new dataset. Researchers developed a complete processing pipeline to transform extensive recordings into a readily usable dataset of aligned audio and transcripts, containing approximately 60 million words. By fine-tuning several OpenAI Whisper models using this new resource, the team achieved substantial reductions in word error rates on standard Slovak benchmarks, with particularly notable improvements for smaller models. This achievement moves Slovak beyond the status of a low-resource language for automatic speech recognition.

The study demonstrates the effectiveness of utilizing parliamentary speech as a valuable resource for training Slovak ASR models and provides a methodology applicable to similar long-form audio alignment tasks. All resources, including the dataset, transcripts, and fine-tuned models, have been publicly released to facilitate further research. The authors acknowledge that Whisper models are prone to occasional “hallucinations”, sometimes manifesting as parliamentary phrasing, and recommend techniques like compression ratio checks to mitigate this. They also observed a trade-off, with improved Slovak performance accompanied by a reduction in English transcription capabilities. Future work could extend this alignment method to parliamentary recordings in other European countries, boosting ASR resources for multiple low-resource languages.

👉 More information
🗞 SloPalSpeech: A 2,8000-Hour Slovak Speech Corpus from Parliamentary Data
🧠 ArXiv: https://arxiv.org/abs/2509.19270

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Renormalization Group Flow Irreversibility Enables Constraints on Effective Spatial Dimensionality

Renormalization Group Flow Irreversibility Enables Constraints on Effective Spatial Dimensionality

December 20, 2025
Replica Keldysh Field Theory Unifies Quantum-Jump Processes in Bosonic and Fermionic Systems

Replica Keldysh Field Theory Unifies Quantum-Jump Processes in Bosonic and Fermionic Systems

December 20, 2025
Quantum Resource Theory Achieves a Unified Operadic Foundation with Multicategorical Adjoints

Quantum Resource Theory Achieves a Unified Operadic Foundation with Multicategorical Adjoints

December 20, 2025