Gradet-htr: Resource-Efficient Bengali Handwritten Text Recognition System Achieves Improved Accuracy with Grapheme-Based Tokenizer

Despite being the sixth most spoken language globally, Bengali handwritten text remains a significant challenge for automated recognition systems, hampered by the script’s complexity and limited training data. Md. Mahmudul Hasan, Ahmed Nesar Tahsin Choudhury, and Md. Mahmudul Hasan, all from the University of Dhaka, alongside Md. Mosaddek Khan, present a new system, GraDeT-HTR, which tackles this problem with a resource-efficient approach. The team developed a system based on a decoder-only Transformer architecture, enhanced by a novel grapheme-based tokenizer that recognises the fundamental building blocks of the Bengali script. This innovation significantly improves recognition accuracy compared to standard methods, and the system achieves state-of-the-art performance on multiple benchmark datasets after pre-training on synthetic data and fine-tuning with real handwritten samples.

Synthetic Data and Two-Stage Bengali OCR

Scientists have created a new system for converting handwritten Bengali documents into text, addressing a significant challenge in optical character recognition. The team employed a two-stage approach, first generating large amounts of synthetic training data to supplement real-world handwritten samples, and then pretraining and fine-tuning a model to recognize the characters. Realistic distortions, such as wavy lines, blur, and fragments, were incorporated into the synthetic data to improve its effectiveness. The system also utilizes a two-stage pretraining process, initially focusing on line-level images and then on word-level images, allowing it to learn features at different levels of detail. The system’s performance is evaluated using metrics that measure character and word-level errors, and further assessed using large language models to evaluate transcription quality. A user-friendly web interface allows users to easily upload documents and extract the text.

Grapheme Tokenizer Boosts Bengali Script Recognition

Researchers have developed GraDeT-HTR, a new Bengali handwritten text recognition system designed to overcome the challenges posed by the script’s complexity and limited available data. Addressing a gap in optical character recognition technology, the system utilizes a decoder-only Transformer architecture, enhanced with a grapheme-based tokenizer, to achieve improved accuracy in recognizing handwritten Bengali text. This innovative tokenizer is designed to handle the approximately 13,000 graphemes present in the Bengali script, allowing for more accurate character representation. The system operates as an end-to-end pipeline, integrating both text detection and recognition for full-page images, beginning with a module that segments images into individual words. Measurements confirm the effectiveness of this approach, as the system achieves state-of-the-art performance on multiple benchmark datasets.

Bengali Handwriting Recognition with Grapheme-Based Transformers

This research presents GraDeT-HTR, a resource-efficient system for recognizing handwritten Bengali text, a challenging task due to the script’s complexity and limited available data. The team developed a decoder-only Transformer architecture, enhanced with a grapheme-based tokenizer specifically designed for the nuances of Bengali script, significantly improving recognition accuracy compared to existing methods. The system operates at the word level and delivers recognized text in an editable format, allowing for user corrections, and supports multi-page document review with export options for plain text or Microsoft Word formats. The complete system pipeline has been released publicly under an open-source license. Future work will focus on expanding the training dataset to include more diverse backgrounds and noise, pre-training larger language models for Bengali, and exploring alternative auto-regressive language models, alongside refining text detection to minimize segmentation errors.

👉 More information
🗞 GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer
🧠 ArXiv: https://arxiv.org/abs/2509.18081

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Quantum Machine Learning Achieves Cloud Cover Prediction Matching Classical Neural Networks

Quantum Machine Learning Achieves Cloud Cover Prediction Matching Classical Neural Networks

December 22, 2025
Nitrogen-vacancy Centers Advance Vibronic Coupling Understanding Via Multimode Jahn-Teller Effect Study

Nitrogen-vacancy Centers Advance Vibronic Coupling Understanding Via Multimode Jahn-Teller Effect Study

December 22, 2025
Second-order Optical Susceptibility Advances Material Characterization with Perturbative Calculations

Second-order Optical Susceptibility Advances Material Characterization with Perturbative Calculations

December 22, 2025