Retrieval-augmented generation (RAG) systems increasingly rely on efficient information retrieval pipelines to process queries and formulate responses, yet existing research platforms often hinder the deployment of these dynamic systems in real-time applications. Eugene Yang, Andrew Yates, and Dawn Lawrie, all from Johns Hopkins University, alongside James Mayfield and Trevor Adriaanse, address this challenge with RoutIR, a new Python package designed for fast serving of retrieval pipelines. RoutIR offers a simple HTTP application programming interface (API) that seamlessly integrates various retrieval methods, including initial searches, reranking, query expansion, and result fusion. This innovation enables the construction and querying of flexible, on-the-fly pipelines, supporting applications requiring looping, feedback mechanisms, or even self-organizing agents , a significant step beyond traditional, offline batch processing. By automating asynchronous query batching and caching, RoutIR provides a practical solution for deploying state-of-the-art retrieval models in dynamic RAG systems and is available as an open-source resource.

Cranfield Paradigm Limits Dynamic Retrieval Systems

Retrieval-augmented generation (RAG) systems function by processing queries, retrieving relevant documents, and generating a response. These systems are frequently dynamic, potentially involving multiple rounds of retrieval to refine results. Current academic information retrieval platforms generally adhere to the Cranfield paradigm, which assumes all queries are known in advance and can be processed offline in batches. This simplification limits the deployment of state-of-the-art retrieval models in downstream applications demanding online services, such as dynamic RAG pipelines incorporating looping, feedback mechanisms, or self-organizing agents.

The research presented introduces RoutIR, a Python package designed to address this limitation. RoutIR facilitates the creation of real-time retrieval services, enabling the development and evaluation of retrieval models within the context of interactive RAG systems. The package provides tools for streaming queries, managing document processing, and integrating retrieval models into dynamic pipelines.

Application Method

RoutIR is a system designed as a simple and efficient HTTP Application Programming Interface (API) that integrates various retrieval methods, encompassing initial document retrieval, reranking processes, query expansion techniques, and result fusion strategies. The system utilises a minimal JSON configuration file to define the retrieval models to be employed, enabling the dynamic construction and querying of retrieval pipelines using any combination of available models. RoutIR automatically manages asynchronous query batching and incorporates a default caching mechanism for improved performance, centred around a query processor that manages a query queue and batcher, alongside an engine cache manager. Users submit HTTP requests specifying a retrieval pipeline and a search query, which RoutIR then orchestrates.

This allows for flexible experimentation with different retrieval models, including dense retrieval methods utilising FAISS, lexical search with Anserini, and Large Language Model (LLM) rerankers, all extendable through the implementation of an Engine Abstract class. Integrating retrieval models into dynamic systems, particularly within Retrieval-Augmented Generation (RAG) pipelines, presents challenges due to the requirement for online querying capabilities that static environments cannot provide. RoutIR addresses these challenges by offering a flexible API for constructing and querying retrieval pipelines on-the-fly, accommodating the dynamic nature of modern RAG systems. The system is open-source and publicly available on GitHub, providing a platform for researchers to experiment with and extend state-of-the-art retrieval methods without the overhead of modifying offline processing tools or inefficient API wrappers.

Scientists Results

Scientists have developed RoutIR, a novel Python package designed as a high-performance HTTP API for retrieval pipelines, crucial for Retrieval-Augmented Generation (RAG) systems. This work addresses a significant gap in existing academic information retrieval platforms, which traditionally operate under the Cranfield paradigm, processing predefined queries offline. Experiments demonstrate RoutIR’s ability to construct and query pipelines on-the-fly, utilising any combination of available retrieval models, including first-stage retrieval, reranking, query expansion, and result fusion. The team measured RoutIR’s performance during the JHU SCALE 2025 workshop, successfully providing the retrieval API for PLAID-X, SPLADE-v3, and Qwen3 Embedding models across multiple TREC document collections, including NeuCLIR, RAGTIME, BioGen, and RAG.

This was achieved using only three NVIDIA 24GB TITAN RTXs, showcasing the system’s efficiency. Results demonstrate reasonable latency without caching and near-instantaneous results when cached, highlighting the effectiveness of the implemented caching mechanisms. RoutIR achieves a throughput of 3 to 10 queries per second with asynchronous HTTP requests, a typical scenario in RAG research where multiple queries are generated concurrently. Notably, the system also powered the search service for the TREC RAGTIME track, serving the PLAID-X model to all participants. When utilising only CPU resources on an AWS virtual machine, the endpoint achieved a latency of approximately 600 milliseconds while serving the TREC NeuCLIR and RAGTIME collections. RoutIR incorporates dynamic batching and queuing for concurrent, asynchronous requests, alongside fast and robust caching in both memory and Redis. The package supports simple service configuration for common model architectures and minimal wrappers for incorporating new models, providing a flexible and easily expandable framework for RAG research and internal prototyping.

RoutIR, Conclusion

This work introduces RoutIR, a toolkit designed to facilitate the rapid deployment and serving of retrieval models for retrieval-augmented generation (RAG) systems. RoutIR provides a flexible HTTP application programming interface (API) enabling the construction of dynamic pipelines, combining various retrieval methods such as initial searches, reranking, and result fusion. The system’s architecture prioritises efficiency through asynchronous query batching and caching, demonstrably increasing query throughput. RoutIR has already proven its utility as a foundational component in several research initiatives, including the 2025 JHU SCALE Workshop and the 2025 TREC RAGTIME Track, supporting over fifty researchers. The authors acknowledge that RoutIR remains under development, with planned extensions including integration of large language model rerankers and adoption of the Model Context Protocol interface, alongside improvements to resource management. Future work will focus on enhancing these features and refining the system’s capabilities based on community contributions and feedback, all of which are welcomed through the open-source GitHub repository.

👉 More information
🗞 RoutIR: Fast Serving of Retrieval Pipelines for Retrieval-Augmented Generation
🧠 ArXiv: https://arxiv.org/abs/2601.10644

Tags:

asynchronous query batching Engine abstract class. HTTP API query expansion RAG systems reranking result fusion Retrieval-Augmented Generation RoutIR

Routir Achieves Fast Serving for Retrieval-Augmented Generation Pipelines with Dynamic Queries

Cranfield Paradigm Limits Dynamic Retrieval Systems

Application Method

Scientists Results

RoutIR, Conclusion

Rohail T.

Latest Posts by Rohail T.:

Accurate Quantum Sensing Now Accounts for Real-World Limitations

Quantum Error Correction Gains a Clearer Building Mechanism for Robust Codes

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently