Multilingual E-Commerce Search Improves Relevance with Data-Centric Framework Addressing Imbalance and Limited Supervision

Multilingual e-commerce search currently struggles with uneven data quality, inaccurate labels, and limited resources for less common languages, hindering the ability of powerful language models to accurately connect customers with products. Yabo Yin, Yang Xi, and Jialong Wang, alongside Shanqi Wang and Jiateng Hu, address this challenge by focusing not on altering the search technology itself, but on fundamentally improving the data used to train it. Their work introduces a practical framework that enhances performance in two key areas, matching searches to relevant product categories and to individual product titles, through a combination of data augmentation, refined negative sampling, and automated error detection. Evaluated on a large e-commerce dataset, this data-centric approach consistently outperforms existing language model-based search systems, demonstrating that careful data engineering can be a highly effective, and readily implementable, strategy for building robust multilingual e-commerce search.

Studies demonstrate a growing trend towards using large language models (LLMs) to build more intelligent search systems, tackling the complexities of multiple languages and enhancing relevance matching for a better user experience. Foundational models like Llama 2, Qwen3, Bloom, and Alpaca are being leveraged, with fine-tuning techniques such as LoRA efficiently adapting these models to specific e-commerce tasks. Researchers are also leveraging reinforcement learning from human feedback to train LLMs to provide more helpful and harmless search results.

A significant challenge lies in supporting multiple languages, requiring accurate machine translation and techniques for cross-lingual representation learning to understand queries and products regardless of language. Handling queries that combine multiple languages, known as code-switching, is another area of active research. Improving relevance matching involves contrastive learning, incorporating hierarchical category structures for long-tail queries, and creating effective embeddings for both queries and products. Models that can focus on relevant parts of queries and product descriptions further enhance performance.

Frameworks like CSRM-LLM and LREF are being developed to leverage multilingual LLMs, and techniques like PagedAttention address efficient memory management for LLM serving. Overall, these studies demonstrate that LLMs are transforming e-commerce search, with multilingual support and relevance matching as core areas of focus. Efficient serving and memory management are also crucial considerations when deploying these powerful models.

Cross-lingual Data Augmentation for E-commerce Search

Scientists addressed the challenges of multilingual e-commerce search by developing a data-centric framework to improve query relevance. Rather than modifying the large language model itself, they redesigned the training data using three complementary strategies to enhance performance on both query-category and query-item relevance tasks. First, they implemented translation-based augmentation, using a massively multilingual translation model to create training examples for languages with limited data, enabling cross-lingual knowledge transfer without requiring native language supervision. To further refine performance, the team developed a semantic negative sampling technique, constructing challenging negative examples by pairing translated queries with plausible, yet incorrect, category paths or product items, sharpening the model’s ability to discriminate between relevant and irrelevant results.

Complementing these techniques, they introduced self-validation filtering, using model-based consistency checks to identify and remove likely mislabeled or ambiguous training instances, significantly enhancing the reliability of the training data. Experiments using the Qwen3 large language model, kept fixed throughout the study, evaluated the impact of these data engineering strategies on the CIKM AnalytiCup 2025 dataset. The results demonstrated consistent F1 score improvements on both query-category and query-item tasks, showcasing the effectiveness of a systematic, data-centric approach to building robust multilingual search systems. This work highlights that carefully engineered training data can be as impactful as, and often more deployable than, complex model modifications in real-world e-commerce settings.

Multilingual E-commerce Search Relevance Improved Significantly

Scientists achieved substantial improvements in multilingual e-commerce search relevance through a novel data-centric framework, demonstrating significant gains on the CIKM AnalytiCup 2025 dataset. The research focused on enhancing performance in two core tasks: query-category relevance and query-item relevance. Rather than altering the underlying large language model, the team redesigned the training data using three complementary strategies to address challenges of language imbalance and label quality. The team employed translation-based augmentation, synthesizing training examples for languages absent in the original dataset, thereby enabling cross-lingual knowledge transfer without requiring native supervision.

Experiments revealed this technique successfully expanded language coverage and improved model performance across underrepresented languages. Furthermore, semantic negative sampling generated challenging negative examples by pairing translated queries with plausible but incorrect category paths or items, sharpening the model’s ability to discriminate between relevant and irrelevant results. To enhance label reliability, scientists implemented self-validation filtering, utilizing model-based consistency checks to identify and remove likely mislabeled training instances. Using Qwen3 as a fixed base large language model, the team consistently achieved F1 score improvements on both query-category and query-item tasks, demonstrating the effectiveness of their data-centric approach. The research confirms that systematic data engineering can be as impactful as, and often more deployable than, complex model modifications, offering a practical pathway to building robust multilingual search systems for real-world e-commerce applications.

Data Engineering Boosts Multilingual E-commerce Search

This research presents a data-centric framework to improve multilingual e-commerce search, specifically for determining the relevance of queries to both product categories and individual product titles. The team systematically addressed challenges arising from linguistic differences, imbalanced datasets, and potential errors in training data, achieving substantial performance gains without altering the underlying search model. Through translation-based data augmentation, the generation of challenging negative examples, and filtering of likely mislabeled data, the approach consistently outperformed strong baseline models on the CIKM AnalytiCup 2025 competition dataset. The findings demonstrate that careful data engineering can be a highly effective, and often more practical, strategy for enhancing search performance compared to complex model modifications.

The research highlights the importance of analysing data characteristics before implementing enhancements, offering valuable guidance for building robust multilingual search systems in real-world e-commerce settings. While a performance gap remains, the team’s success with a purely data-centric approach validates its potential as a component within more advanced systems. The authors acknowledge that further improvements could be achieved by enhancing robustness to variations in queries, such as misspellings and slang, through targeted data augmentation. They also suggest that establishing iterative data refinement pipelines involving human feedback could create a continuous cycle of dataset improvement. Future research directions include integrating external knowledge sources, like product knowledge graphs, and exploring advanced reinforcement learning techniques tailored to this domain, potentially with larger, more specialized foundation models.

👉 More information
🗞 A Data-Centric Approach to Multilingual E-Commerce Product Search: Case Study on Query-Category and Query-Item Relevance
🧠 ArXiv: https://arxiv.org/abs/2510.21671

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Neural Algorithmic Reasoning Advances Approximate Coloring with Recursive Warm Starts for Graph Problems

Neural Algorithmic Reasoning Advances Approximate Coloring with Recursive Warm Starts for Graph Problems

January 12, 2026
Photorealistic spacecraft star tracker camera mounted on satellite body, Earth visible in background space, realistic optics housing, clean engineering detail, no glow, no text

Adaptive Estimation Achieves Precise Spacecraft Attitude and 3D Star-Tracker Alignment

January 12, 2026
Low-loss Material Achieves Infrared Protection for Cryogenic Quantum Applications at Gigahertz Frequencies

Low-loss Material Achieves Infrared Protection for Cryogenic Quantum Applications at Gigahertz Frequencies

January 12, 2026