LSTM Network Achieves 78% Accuracy in Bangla Music Genre Classification

Researchers are tackling the challenge of automatically categorising the rapidly expanding world of Bangla music. Muntakimur Rahaman (Bangladesh Army International University of Science and Technology), Md Mahmudul Hoque (CCN University of Science and Technology), and Md Mehedi Hassain et al. present a new dataset and a deep learning framework to address this need, offering a significant step towards improved music information retrieval. Their work is particularly important given the cultural richness of Bangla music and the increasing difficulty of manually indexing vast digital libraries , achieving 78% classification accuracy demonstrates the potential of bidirectional Long Short-Term Memory (LSTM) networks and Mel-Frequency Cepstral Coefficients (MFCCs) for efficient Bangla music genre classification.

Bangla Music Genre Classification via LSTM networks

Scientists have achieved a significant breakthrough in Bangla music genre classification, addressing a critical need for efficient indexing and retrieval within the rapidly expanding digital music landscape. This innovative approach leverages the power of deep learning to overcome limitations inherent in traditional machine learning methods often used for genre classification. Central to this work is the extraction of meaningful features from raw audio waveforms. This achievement signifies a considerable advancement in music information retrieval, particularly for a language and musical tradition previously underrepresented in automated classification research.
Researchers particularly focused on how the linguistic characteristics of Bengali influence musical features and overall classification performance. This detailed analysis allows for a more nuanced understanding of the interplay between language, music, and automated analysis. The team converted original mp3 files to wav format and extracted features including Zero Crossing Rate, spectral centroid, and crucially, MFCCs, which proved pivotal in the Music Information Retrieval process. Furthermore, the study employed a Bidirectional LSTM model, feeding its output into a dense layer to generate precise class labels. This sophisticated approach builds upon previous work in genre classification, such as Kris West et al. ’s emphasis on onset detection and the use of unsupervised decision trees, and S. Patil et al. ’s work on voiced and non-voiced speech segment separation using Zero Crossing Rate and energy-based features.

Bangla Music Genre Classification Using LSTM Networks

This work pioneers an automated Bangla music genre classification system, crucial for navigating the expanding landscape of digital and physical music formats. Experiments began with converting original mp3 files into the wav format, preparing the audio for detailed analysis. Feature extraction then proceeded utilising techniques including Zero-Crossing Rate (ZCR), spectral centroid, and crucially, MFCCs, a cornerstone of Music Information Retrieval (MIR) and genre classification. The team meticulously extracted these features from each song within the ten genres: Bangla hip-hop, Bangla metal, Bangla rock, deshattobodhok, Palligiti, lalon giti, Nazrul Sangeet, Rabindra Sangeet, folk, and hamdanaat.

This detailed feature engineering process enabled the LSTM network to learn the nuanced characteristics of each genre. The innovative use of LSTM networks, combined with MFCC-based feature extraction, allows the model to capture complex temporal dependencies within the audio data. Furthermore, the research addresses the challenges of limited labelled datasets and complex feature extraction processes inherent in Bangla music classification. The team’s methodological innovations contribute to a robust and computationally efficient framework, promising to benefit the Bangla music industry and enhance the overall user experience for listeners.

Bangla Music Genre Classification via LSTM Networks

The team measured performance by evaluating the model’s ability to correctly identify genres within the dataset, ultimately reaching the 78% accuracy benchmark. Researchers recorded these MFCCs, capturing the essential characteristics of each musical piece for accurate classification. Data shows that the implemented Bidirectional LSTM model, coupled with a dense output layer, successfully generated correct class labels for the Bangla music samples. The framework’s success hinges on its ability to model intricate patterns and temporal dependencies within the musical data, a key advantage over traditional methods.

The work converted original mp3 files to wav format before feature extraction, utilising techniques such as Zero Crossing Rate (ZCR) and spectral centroid alongside MFCCs. Measurements confirm that ZCR and energy-based features effectively differentiate voiced and non-voiced segments, contributing to the overall accuracy of the system. Results demonstrate the effectiveness of this approach in classifying genres including Bangla hip-hop, giti, Nazrul Sangeet, Rabindra Sangeet, folk, and hamdanaat. Tests prove that by addressing data scarcity challenges specific to Bangla music, this research significantly improves the representation and accessibility of this music in the digital space.

Bangla Music Genre Classification via LSTM Networks

The findings demonstrate strong performance in classifying Lalon Geet, with a 0.91 F1-score, and Polli Geeti, achieving 0.86, however, challenges remain with genres such as Folk (0.04) and Metal (0.39). The authors acknowledge limitations related to dataset size and algorithmic refinement, suggesting that improvements in these areas could further enhance the system’s performance and adaptability. Future work should focus on expanding the dataset and exploring more sophisticated feature engineering techniques to improve classification accuracy across all genres. This achievement has tangible value for digital music services, broadcasting platforms, and streaming providers, supporting the development of improved recommendation systems and enhanced listener engagement. Furthermore, the research contributes to the digital preservation and dissemination of Bangla musical heritage, aiding in cultural conservation efforts. The study highlights the potential of deep learning approaches to streamline genre identification and improve the management of musical archives.

👉 More information
🗞 Bangla Music Genre Classification Using Bidirectional LSTMS
🧠 ArXiv: https://arxiv.org/abs/2601.15083

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Flowssc Achieves Real-Time Monocular Semantic Scene Completion Via Latent Diffusion

Flowssc Achieves Real-Time Monocular Semantic Scene Completion Via Latent Diffusion

January 26, 2026
Tsn-IoT Achieves Prioritized Access and Connectivity for Dense IoT Networks

Tsn-IoT Achieves Prioritized Access and Connectivity for Dense IoT Networks

January 26, 2026
G NR-Ntn Protocol Design Advances Full-Stack Performance in Challenging Networks

G NR-Ntn Protocol Design Advances Full-Stack Performance in Challenging Networks

January 26, 2026