TongSearch QR demonstrates that smaller language models, specifically Qwen2.5-7B and Qwen2.5-1.5B, can achieve reasoning-intensive information retrieval performance comparable to larger models like GPT-4. Employing reinforcement learning with a semi-rule-based reward function, TongSearch QR enhances query rewriting and outperforms existing retrieval baselines on the BRIGHT benchmark.
The challenge of effectively retrieving information from vast datasets intensifies when queries demand more than simple keyword matching, requiring instead complex reasoning and semantic understanding. Researchers are now focusing on methods to augment queries before retrieval, leveraging the capabilities of large language models (LLMs) to anticipate information needs. However, the computational expense of deploying these powerful LLMs presents a significant obstacle to practical implementation. A team from the Beijing Institute for General Artificial Intelligence (BIGAI), comprising Xubo Qin, Jun Bai, Jiaqi Li, Zixia Jia, and Zilong Zheng, addresses this issue in their work, “TongSearch-QR: Reinforced Query Reasoning for Retrieval”. They present a novel approach utilising reinforcement learning to enable smaller, more efficient language models, such as Qwen2.5, to perform query reasoning with performance comparable to much larger systems, offering a pathway towards deployable reasoning-intensive information retrieval.
TongSearch-QR presents a novel approach to information retrieval, specifically addressing limitations encountered when dealing with tasks demanding complex reasoning. The system employs a reinforcement learning framework to train comparatively smaller language models, namely Qwen2.5-7B-Instruct and Qwen2.5-1.5B-Instruct, to refine search queries and improve retrieval performance without the substantial computational cost associated with larger models.
Traditional information retrieval systems frequently struggle with intricate inquiries requiring multi-step inference and a nuanced understanding of semantics, often relying heavily on textual and semantic matching. TongSearch-QR actively incorporates a query reasoning stage, utilising large language models to reformulate search queries before document retrieval, thereby enhancing performance on these complex inquiries. This contrasts with methods that focus solely on matching keywords or semantic similarity.
The system utilises a semi-rule-based reward function, incentivising the language model to generate reasoning-rich queries formatted with specific tags. The tag encapsulates the reasoning process itself, while contains the final, refined query. A negative reward is immediately assigned for incorrect formatting, ensuring adherence to the prescribed structure and facilitating effective propagation of the reward signal. This strict format check, immediately penalising deviations from the “ format, ensures controlled and predictable model behaviour.
Evaluations using the BRIGHT benchmark, a dataset designed to assess reasoning capabilities in information retrieval, demonstrate that TongSearch-QR significantly outperforms existing baselines. These include prompt-based query reasoners, which rely on pre-defined prompts to guide reasoning, and recent dense retrieval models specifically designed for reasoning tasks. Both the 7 billion and 1.5 billion parameter versions of TongSearch-QR achieve superior results when paired with the BM25 retrieval algorithm, a ranking function based on term frequency and inverse document frequency, highlighting the system’s ability to enhance retrieval performance even with relatively modest model sizes. This offers a practical advantage for real-world deployment where computational resources are often constrained.
The research leverages two versions of the BRIGHT dataset, Version 1 containing approximately 10,000 questions across nine categories, and Version 2, a substantially larger dataset with nearly 30,000 questions, to validate the system’s performance and generalizability.
Future work focuses on refining the reward function to further enhance the quality of query reasoning, exploring alternative reinforcement learning algorithms, and investigating the potential of incorporating external knowledge sources. These improvements aim to push the boundaries of reasoning-intensive retrieval, enabling the system to handle even more complex and nuanced inquiries. Additionally, extending the evaluation to a broader range of datasets and tasks will further validate the generalizability and robustness of the TongSearch-QR approach.
The system’s success demonstrates the potential of combining reinforcement learning with smaller language models to achieve competitive performance in reasoning-intensive retrieval, circumventing the high inference costs associated with larger models. This offers a cost-effective and efficient solution for complex information retrieval tasks, making it accessible to a broader range of users and applications.
The research actively contributes to the field by demonstrating that effective query reasoning does not necessarily require extremely large language models, challenging the prevailing trend of relying on massive models. By leveraging reinforcement learning and a carefully crafted reward function, TongSearch-QR successfully trains smaller models to achieve competitive performance, opening up new possibilities for deploying reasoning-intensive retrieval systems in resource-constrained environments.
👉 More information
🗞 TongSearch-QR: Reinforced Query Reasoning for Retrieval
🧠 DOI: https://doi.org/10.48550/arXiv.2506.11603
