Large language models exhibit diminished comprehension when processing code-switching, the practice of alternating between languages within a single conversation. However, embedding English within other languages sometimes improves performance. Fine-tuning consistently mitigates degradation more effectively than prompting techniques alone.
The increasing prevalence of multilingual communication online presents a significant challenge for artificial intelligence systems designed to process natural language. Individuals routinely blend languages within single conversations, a phenomenon known as code-switching, where phrases or even single words from one language appear within a discourse primarily conducted in another. This linguistic practice, common in multilingual communities, now frequently appears in digital content and poses a problem for Large Language Models (LLMs), which underpin many content creation and analysis tools. Researchers from the Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and École Polytechnique, namely Amr Mohamed, Yang Zhang, Michalis Vazirgiannis, and Guokan Shang, address this issue in their paper, “Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text”. Their work systematically evaluates how LLMs comprehend text containing code-switching, utilising modified reasoning benchmarks to assess performance and explore mitigation strategies such as prompt engineering and fine-tuning.
Large language models (LLMs) exhibit performance decline when processing code-switched text, yet retain a capacity for comprehension even with the introduction of tokens from other languages. Researchers systematically evaluate LLM performance on established reasoning and comprehension benchmarks after generating code-switched variants, revealing that disruption to the dominant English language framework impacts accuracy. However, embedding English within other languages frequently improves comprehension, suggesting a degree of robustness within these models and highlighting the complex interplay between linguistic structure and model understanding. Code-switching, a linguistic phenomenon where speakers alternate between two or more languages within a single conversation, presents a unique challenge for LLMs accustomed to processing monolingual text.
The methodology centres on generating code-switched text by replacing nouns in English sentences with their equivalents in other languages, including Arabic, Chinese, French, and German. This approach allows for controlled evaluation of how LLMs handle linguistic mixing, adhering to principles of code-switching where one language, the ‘matrix language’, provides the grammatical structure. The research confirms that models are sensitive to deviations from this structure, but also highlights their ability to process mixed-language input under certain conditions, demonstrating a nuanced response to linguistic variation. The controlled substitution of nouns allows researchers to isolate the impact of lexical variation on model performance.
Evaluation encompasses multiple natural language processing (NLP) tasks, including multiple-choice question answering (Belebele and MMLU) and natural language inference (XNLI), providing a comprehensive assessment of LLM comprehension abilities in the context of code-switching. These tasks challenge the models to not only understand the individual words but also to integrate information across languages, revealing the extent of their cross-lingual reasoning capabilities. Results indicate that while initial performance suffers with the introduction of foreign language elements, strategic prompting can offer some mitigation, though its effectiveness remains variable. Prompt engineering, the art of crafting effective input prompts to guide model behaviour, proves partially effective in recovering lost performance.
A key finding centres on the benefits of instruction tuning, specifically fine-tuning the LLaMA-3.1-8B-Instruct model, which significantly improves performance on code-switched text. By creating a dataset of over 14,000 code-switched examples derived from parallel TED Talk translations, researchers demonstrate a more stable and reliable path to mitigating performance degradation, showcasing the power of targeted training data. The use of varied prompt templates during fine-tuning prevents overfitting and encourages generalization, allowing the model to better handle unseen code-switched text, ultimately enhancing its adaptability and robustness. Fine-tuning, a process of further training a pre-trained model on a specific dataset, allows the model to specialise in handling code-switched language.
Researchers investigate the impact of different code-switching patterns and language pairings on LLM performance, aiming to identify the factors that contribute to successful comprehension. Exploring the role of linguistic features, such as morphological similarity and syntactic alignment, provides further insights into the mechanisms underlying code-switching comprehension, revealing the intricate relationship between linguistic structure and model processing. Additionally, research focuses on developing more robust and adaptive prompting strategies that dynamically adjust to the specific characteristics of the code-switched input, paving the way for more effective and nuanced interactions with multilingual text. Morphological similarity refers to the degree to which the word structures of different languages resemble each other.
Expanding the dataset to include a wider range of languages and domains enhances the generalizability of the findings, ensuring that the model can perform well across diverse linguistic contexts. Investigating the potential of cross-lingual transfer learning, where knowledge gained from processing code-switched text in one language is applied to another, represents a promising avenue for future research, potentially unlocking new levels of efficiency and performance. Finally, exploring the application of these findings to real-world scenarios, such as multilingual customer service and social media analysis, demonstrates the practical benefits of improving LLM comprehension of code-switched text. Cross-lingual transfer learning leverages the similarities between languages to improve model performance on low-resource languages.
The study underscores the importance of understanding the interplay between linguistic structure and model processing, revealing that LLMs are not simply pattern-matching machines but rather complex systems that can leverage contextual cues and linguistic knowledge to overcome challenges. This nuanced understanding of LLM processing is crucial for developing more effective and robust models that can handle the complexities of human language. The research suggests that LLMs possess a degree of linguistic awareness beyond simple statistical pattern recognition.
Researchers emphasize the need for continued research in this area, highlighting the potential for LLMs to revolutionise the way we interact with multilingual text. By developing models that can seamlessly process code-switched language, we can unlock new opportunities for communication, collaboration, and understanding across cultures. This research represents a significant step towards realising that vision. The ability to process code-switched language is crucial for building truly inclusive and globally accessible language technologies.
👉 More information
🗞 Lost in the Mix: Evaluating LLM Understanding of Code-Switched Text
🧠 DOI: https://doi.org/10.48550/arXiv.2506.14012
