Researchers developed a retrieval framework that enhances Large Language Model performance on long-context tasks by utilising logical inference, eliminating the need for embedding-based search or complex graph construction. Evaluations on benchmarks such as NovelQA and Marathon demonstrate improved accuracy with over a ten-fold reduction in storage and processing time.
The limitations of large language models (LLMs) in processing extended contexts pose a significant challenge for tasks requiring comprehensive information recall, such as detailed question answering and multi-turn dialogues. Current retrieval-augmented generation (RAG) systems, designed to address this, frequently depend on pre-calculated semantic embeddings – numerical representations of text – to identify relevant information. However, these embeddings can sometimes prioritise superficial similarity over genuine contextual alignment, and more complex RAG variants introduce substantial computational costs. Researchers are now exploring alternative approaches that leverage the inherent reasoning capabilities of LLMs themselves to refine search parameters and expand results with logically connected data, without relying on extensive pre-processing or graph-based structures. This work, detailed in ‘ELITE: Embedding-Less retrieval with Iterative Text Exploration’, is the result of collaboration between Zhangyu Wang (University of Southern California), Siyuan Gao (Jilin University), Rong Zhou (National Supercomputing Center in Shenzhen), Hao Wang (Wuhan University), and Li Ning (Stellaris AI Limited).
Enhanced Retrieval Augmentation for Long-Context Question Answering
Recent advances in Large Language Models (LLMs) demonstrate considerable capabilities, yet inherent limitations in processing extended contexts hinder performance on tasks requiring comprehensive document understanding. Retrieval-Augmented Generation (RAG) addresses this challenge by supplementing LLMs with relevant information retrieved from external sources, effectively expanding their knowledge base and improving response accuracy. However, conventional RAG systems frequently rely on embedding-based retrieval, which can retrieve content superficially similar to the query but lacking genuine relevance, ultimately diminishing the quality of generated responses. This research presents an embedding-free retrieval framework designed to overcome these limitations and unlock the full potential of RAG for long-context question answering.
The authors propose a method that actively leverages the logical inferencing capabilities of LLMs during the retrieval process, shifting the paradigm from static semantic similarity to dynamic reasoning. Instead of relying on pre-computed embeddings – vector representations of text used to quantify semantic similarity – the system iteratively refines the search space, guided by a novel importance measure that prioritises information logically connected to the question. This innovative approach effectively extends the retrieval results with pertinent, related content without incurring the computational burden of constructing explicit graph or hierarchical structures, offering a significant advantage in terms of efficiency and scalability.
Evaluations conducted on long-context question answering benchmarks, specifically NovelQA and Marathon, demonstrate the efficacy of this approach, confirming its superiority over existing methods. The proposed method consistently outperforms strong baseline RAG systems, achieving higher accuracy and more coherent responses, while simultaneously reducing computational costs. Crucially, it achieves these performance gains while significantly reducing both storage requirements and runtime – by over an order of magnitude, highlighting its practical advantages for real-world applications. This combination of enhanced performance and reduced computational cost positions the method as a promising advancement in the field of knowledge retrieval.
This work offers a compelling alternative to embedding-based RAG systems, particularly for applications demanding processing of very long documents and complex queries. By shifting the focus from static semantic similarity to dynamic logical inference, the authors present a more efficient and effective method for augmenting LLMs with external knowledge, enabling them to tackle more challenging tasks. The demonstrated reduction in computational overhead positions this approach as a viable solution for resource-constrained environments and large-scale information retrieval tasks.
The proposed framework actively employs iterative search space refinement, guided by a novel importance measure, to identify relevant information and narrow the search, focusing on content that logically addresses the query. This process effectively extends retrieval results with logically related information, enriching the context provided to the LLM without requiring explicit graph construction or complex data structures, offering a significant advantage in terms of efficiency and scalability. The system dynamically assesses the logical connection between the query and potential retrieval candidates, ensuring that only the most relevant information is considered, leading to more accurate and coherent responses. This approach contrasts sharply with traditional RAG systems that rely on static semantic similarity, which can often lead to irrelevant or misleading results.
Results demonstrate performance gains and reduced storage requirements compared to methods relying on embeddings, such as those described by Mikolov et al. (2013) and Pennington et al. (2014). Future research directions include exploring the integration of this framework with different LLM architectures and investigating the impact of various importance measure parameters on retrieval performance. Additionally, exploring the application of this framework to other natural language processing tasks, such as text summarisation and question answering, could further demonstrate its versatility and effectiveness. The authors also plan to investigate methods for automatically tuning the importance measure parameters to optimise retrieval performance for different types of queries and datasets.
👉 More information
🗞 ELITE: Embedding-Less retrieval with Iterative Text Exploration
🧠 DOI: https://doi.org/10.48550/arXiv.2505.11908
