Quantum Networks Boost Speech Emotion Detection to 80.12 Per Cent Accuracy

A new hybrid quantum-classical framework, HQTN-SER, improves emotion classification with limited qubits. Mahad Mohtashim and colleagues at National University of Sciences & Technology (NUST), in collaboration with NYUAD Research Institute and New York University, have developed this system to address the key challenges of speech emotion recognition, a field often limited by subtle emotional cues and the need for extensive training data. Their research investigates how structured quantum tensor networks model correlations within speech representations, achieving accuracies of 80.12% on RAVDESS, 78.26% on SAVEE, and 73.51% on MDER. These results offer a reproducible baseline and demonstrate the potential of tensor network structure as an effective and practical design choice for quantum-assisted affective computing

Quantum tensor networks enable high accuracy speech emotion recognition with limited qubits

HQTN-SER, a new hybrid quantum-classical framework, achieved 80.12 per cent accuracy on the RAVDESS dataset, exceeding previous limitations in speech emotion recognition which struggled with subtle cues and required extensive data. Previously, achieving such accuracy necessitated either larger quantum systems or sharply more classical processing power. This result demonstrates strong performance with a limited number of qubits, a vital hurdle in current quantum computing capabilities. The model utilises quantum tensor networks, a method of structuring quantum bits to efficiently model relationships within speech data, and combines this with classical machine learning for end-to-end emotion classification. Speech emotion recognition is increasingly important in applications such as call centre analytics, mental health monitoring, and the development of more natural human-computer interfaces. However, the inherent variability in human speech, including differences in accent, speaking rate, and recording conditions, poses a significant challenge for accurate emotion detection. Traditional machine learning approaches often require vast amounts of labelled data to overcome these challenges, and can struggle to generalise to unseen speakers or environments.

Efficient parameter usage is maintained within the HQTN-SER framework during training, even with limited qubit counts. This is achieved through a novel quantum tensor network module inspired by matrix product states, structuring interactions within speech data effectively. Matrix product states are a powerful tool in condensed matter physics for representing the ground states of many-body quantum systems, and their adaptation to speech data allows for a compact and efficient representation of complex correlations. The HQTN-SER model employs a specific tensor network architecture designed to capture the temporal dependencies within speech signals, effectively modelling how emotional cues evolve over time. A fusion strategy is incorporated into the system’s design, blending quantum measurement features with classical machine learning embeddings to enhance emotion classification. Specifically, the quantum module generates feature vectors based on measurements of the quantum state, which are then concatenated with embeddings derived from classical speech processing techniques, such as Mel-frequency cepstral coefficients (MFCCs). Despite these promising results, performance currently relies on controlled laboratory conditions and does not yet translate strongly to real-world audio recordings with significant noise and variability. Further research is needed to improve the robustness of the model to real-world acoustic conditions, potentially through the incorporation of data augmentation techniques or the development of noise-resilient quantum circuits.

Quantum tensor networks explore potential for nuanced speech emotion recognition

The increasing need to build more intuitive technologies is driving development in recognising human emotion from speech, yet current systems struggle with the subtleties of real-world recordings. Quantum tensor networks, a method of structuring quantum bits, are utilised to model subtle vocal cues, offering a promising, albeit preliminary, step towards using quantum computing to address this challenge. The fundamental principle behind leveraging quantum computing for machine learning lies in the ability of quantum systems to represent and manipulate high-dimensional data spaces more efficiently than classical computers. Quantum tensor networks provide a means of encoding this data into a quantum state, allowing for the exploration of complex correlations that may be difficult to capture with classical methods. The potential benefits of this approach include improved accuracy, reduced computational complexity, and the ability to learn from smaller datasets. However, the field of quantum machine learning is still in its early stages, and significant challenges remain in terms of hardware development and algorithm design.

Clarifying precisely when and where these structured quantum modules can genuinely enhance affective computing, even with limited quantum resources, now establishes a strong, reproducible baseline for future development. Achieving consistent results across multiple datasets, including 80.12% accuracy on RAVDESS, demonstrates the potential of compact quantum modules to model complex vocal patterns. The RAVDESS dataset, comprising 24 actors portraying eight distinct emotions, provides a challenging benchmark for evaluating emotion recognition systems. Similarly, the SAVEE and MDER datasets offer diverse acoustic conditions and emotional expressions, further validating the generalisability of the HQTN-SER model. Crucially, the system maintained stable performance using a limited number of qubits, addressing a key challenge in current quantum hardware limitations. Current quantum computers are limited by the number of qubits, their coherence time, and the fidelity of quantum operations. Developing algorithms that can achieve meaningful results with a few qubits is therefore essential for near-term quantum applications. This establishes a reproducible foundation for exploring quantum-assisted affective computing, moving beyond theoretical possibilities towards practical implementation. Future work will focus on scaling the model to larger datasets, exploring different tensor network architectures, and investigating the potential for integrating quantum generative models to further enhance emotion recognition performance.

The research demonstrated that a hybrid quantum-classical model, HQTN-SER, successfully recognised emotions in speech with up to 80.12% accuracy on the RAVDESS dataset, and comparable results on SAVEE and MDER. This suggests that quantum tensor networks can effectively model the complex patterns in speech that indicate emotional state, even when using a small number of qubits. The model’s stable performance and low qubit requirements are important because current quantum computers have limited resources. The authors intend to scale the model to larger datasets and explore different network architectures to further improve performance.

👉 More information
🗞 HQTN-SER: Speech Emotion Recognition with Hybrid Quantum Tensor Networks
🧠 ArXiv: https://arxiv.org/abs/2605.14523

Stay current. See today’s quantum computing news on Quantum Zeitgeist for the latest breakthroughs in qubits, hardware, algorithms, and industry deals.
Quantum Strategist

Quantum Strategist

Una covers the investment flows, government strategy and international dynamics shaping quantum technology commercialisation. Drawing on a background in technology policy and market analysis, she focuses on the decisions — funding rounds, trade policy, strategic partnerships — that determine whether quantum computing achieves real-world impact.

Latest Posts by Quantum Strategist: