NLP Models Struggle with Science Equations in Learning Analytics

Evaluations of natural language processing embedding models reveal substantial performance variations when analysing student work containing scientific equations. OpenAI’s GPT-text-embedding-3-large model demonstrated superior, though not overwhelming, capability in interpreting symbolic expressions compared to alternatives, with cost and transparency also influencing model choice for learning analytics applications.

The increasing application of natural language processing (NLP) to learning analytics offers opportunities to assess student understanding from textual responses. However, a critical limitation arises when analysing subjects reliant on precise notation, such as physics and mathematics, where symbolic expressions – equations, formulae, and other specialised characters – present a challenge for current NLP techniques. Researchers are now focusing on how effectively different embedding models – algorithms that translate text into numerical representations – can interpret these expressions. A study by Bleckmann, Tschisgale, et al., detailed in ‘Handling Symbolic Language in Student Texts: A Comparative Study of NLP Embedding Models’, systematically evaluates the performance of several contemporary embedding models when applied to physics-specific symbolic expressions found in authentic student work, considering both analytical similarity and integration within a machine learning framework.

Assessing the Representation of Scientific Symbolism in Natural Language Processing Embedding Models

Recent advances in Natural Language Processing (NLP) are increasingly employed within Learning Analytics (LA) to analyse student-generated text. A key technique involves utilising NLP embedding models – algorithms that map words or phrases to numerical vectors, capturing semantic relationships. However, a significant challenge arises when these models encounter science-related symbolic expressions – equations, formulas, and mathematical notation – which often receive insufficient attention in current LA methodologies. Existing research frequently either overlooks this issue or removes symbolic expressions from analysis, potentially introducing bias and reducing the efficacy of LA applications.

This study addresses this gap by investigating how contemporary embedding models differ in their ability to represent and interpret science-related symbolic expressions, a crucial step towards developing more robust and reliable LA systems. We rigorously evaluated several embedding models using physics-specific symbolic expressions extracted from authentic student responses, employing both similarity-based analyses and integration into a machine learning pipeline to comprehensively assess their performance. The findings reveal discernible differences in model performance, with OpenAI’s GPT-text-embedding-3-large consistently outperforming all other examined models, although the advantage was moderate, suggesting careful consideration is required during model selection.

We recognise that performance is not the sole determinant of an effective model. Factors beyond accuracy, such as computational cost, regulatory compliance (particularly regarding data privacy), and model transparency, are crucial elements in the decision-making process for LA researchers and practitioners. This holistic approach ensures that the chosen model delivers accurate results while aligning with ethical considerations and practical constraints.

The increasing prevalence of NLP in educational settings necessitates a critical examination of its limitations, particularly when applied to specialised domains like science, technology, engineering, and mathematics (STEM). Traditional NLP techniques often struggle with the unique characteristics of scientific language, including complex notation, specialised terminology, and the inherent ambiguity of mathematical expressions. Consequently, applying these techniques to student work in STEM fields can lead to inaccurate interpretations and flawed assessments of learning.

We began by assembling a diverse dataset of physics-specific symbolic expressions extracted from actual student responses to problem-solving tasks. This dataset encompassed a wide range of equations, formulas, and mathematical notations, representing various concepts and levels of complexity. We then evaluated several state-of-the-art embedding models, including Word2Vec, GloVe, FastText, and BERT, using both similarity-based analyses and integration into a machine learning pipeline. Similarity-based analyses measured the ability of each model to capture the semantic relationships between different symbolic expressions, while the machine learning pipeline assessed their performance in predicting student performance on problem-solving tasks.

Our results demonstrated that OpenAI’s GPT-text-embedding-3-large consistently outperforms other examined models in representing and interpreting science-related symbolic expressions. However, practical considerations such as computational cost and ease of implementation must be carefully weighed.

The ability to accurately represent and interpret science-related symbolic expressions has broader applications in areas such as automated theorem proving, scientific literature mining, and the development of intelligent tutoring systems. By providing a comparative analysis of different embedding models, this study contributes to the advancement of NLP techniques for handling complex scientific language.

Furthermore, this study highlights the importance of domain-specific knowledge in NLP applications. While general-purpose embedding models can perform well on a variety of tasks, they often struggle with specialised domains like science and mathematics. By incorporating domain-specific knowledge into the model training process, it is possible to significantly improve performance and accuracy.

We acknowledge that this study has several limitations. First, our dataset was limited to physics-specific symbolic expressions, and further research is needed to evaluate the performance of different embedding models on other scientific disciplines. Second, our evaluation was based on a relatively small number of student responses, and larger-scale studies are needed to confirm our findings.

Despite these limitations, this study provides valuable insights into the challenges and opportunities of applying NLP to the analysis of science-related symbolic expressions. Future research should focus on expanding the scope of our evaluation and exploring more advanced NLP techniques.

In conclusion, this research contributes to the advancement of educational tools and promotes a deeper understanding of student learning in STEM fields. We encourage future research to build upon these findings and explore innovative approaches to enhance the accuracy and effectiveness of NLP applications in education.

👉 More information
🗞 Handling Symbolic Language in Student Texts: A Comparative Study of NLP Embedding Models
🧠 DOI: https://doi.org/10.48550/arXiv.2505.17950

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

December 29, 2025
Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

December 28, 2025
Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

December 27, 2025