Researchers are tackling the challenge of providing comprehensive answers in long-form retrieval-augmented generation (RAG) systems, where simply finding relevant documents isn’t enough. Jia-Huei Ju, François G. Landry (Université de Moncton), and Eugene Yang (Johns Hopkins University), alongside Suzan Verberne (Leiden University) and Andrew Yates (Johns Hopkins University) et al., introduce LANCER, a novel LLM-based reranking method designed to maximise information coverage. Unlike current approaches focused solely on relevance, LANCER predicts the underlying sub-questions a query demands, identifies documents addressing those specific points, and then reranks results to ensure a broader range of information ‘nuggets’ are included. This work is significant because it demonstrably improves nugget coverage, achieving higher nDCG scores and offering more complete responses than existing LLM reranking techniques, as confirmed by detailed oracle analysis.

Unlike current approaches focused solely on relevance, LANCER predicts the underlying sub-questions a query demands, identifies documents addressing those specific points, and then reranks results to ensure a broader range of information ‘nuggets’ are included.

This work is significant because it demonstrably improves nugget coverage, achieving higher nDCG scores and offering more complete responses than existing LLM reranking techniques, as confirmed by detailed oracle analysis. Scientists have developed LANCER, a novel LLM-based reranking method designed to significantly improve information coverage in long-form retrieval-augmented generation (RAG) systems. Unlike traditional RAG approaches optimised for relevance ranking, LANCER specifically addresses the need for comprehensive information retrieval, particularly in tasks like automated report generation where a wide range of relevant details is crucial. The research team achieved this by moving beyond simply identifying relevant documents and instead focusing on ensuring the retrieved context covers as many informational ‘nuggets’ as possible.

LANCER operates by first predicting the sub-questions that need answering to satisfy a given information need, then determining which documents address these sub-questions, and finally reranking the documents to maximise nugget coverage. This innovative approach centres on a three-stage process: synthetic sub-question generation, document answerability judgement, and coverage-based aggregation. The system leverages a large language model to formulate these sub-questions, effectively breaking down complex information needs into smaller, manageable components. Subsequently, LANCER assesses which documents can answer these generated sub-questions, providing a nuanced understanding of each document’s informational contribution.

This allows for a reranking process that prioritises documents collectively covering a broader spectrum of information nuggets, rather than solely focusing on individual document relevance. Experiments demonstrate that LANCER enhances retrieval quality as measured by nugget coverage metrics, achieving higher α-nDCG and information coverage compared to other LLM-based reranking methods. The study reveals that the generation of sub-questions plays a vital role in the success of LANCER, with oracle analysis showing substantial performance increases when provided with ideal sub-questions. This highlights the importance of accurately identifying the core informational components of a query.

Furthermore, the research establishes that optimising for coverage, rather than just relevance, can yield significant benefits in long-form RAG tasks. The team’s work opens new avenues for retrieval systems designed to support complex information needs, moving beyond simple relevance to embrace a more holistic view of information coverage. LANCER’s transparency is another key contribution, as the generated sub-questions and associated answerability scores provide a clear audit trail of the information collected and any gaps in coverage. This feature is particularly valuable in applications requiring accountability and explainability. The researchers evaluated LANCER on two datasets with nugget-level judgements, consistently demonstrating its superior performance in improving the coverage of retrieved documents. This breakthrough offers a promising solution for enhancing the quality and comprehensiveness of long-form RAG systems, with potential applications spanning automated report generation, in-depth research assistance, and more.

Scientists Method

Scientists developed LANCER, a novel LLM-based reranking method designed to enhance nugget coverage in long-form retrieval-augmented generation (RAG) systems, particularly for tasks like automated report generation. Unlike existing methods primarily focused on relevance ranking, this work directly addresses the need for comprehensive information coverage by identifying and retrieving documents that address multiple facets of a given information need. The study pioneers a three-stage process beginning with synthetic sub-question generation, utilising an LLM to predict the specific questions that must be answered to fully satisfy the user’s request. This innovative approach moves beyond simple keyword matching to a more nuanced understanding of informational requirements.

Following sub-question generation, researchers employed the LLM to assess document answerability, determining which retrieved documents effectively respond to each generated sub-question. This judgment process creates a granular understanding of each document’s informational contribution, moving beyond document-level relevance scores. The team then implemented a coverage-based aggregation technique, leveraging the answerability predictions to rerank the documents, prioritising lists that encompass the broadest range of information nuggets. This reranking strategy aims to maximise the number of addressed sub-questions within a shallow cutoff, ensuring a concise yet comprehensive retrieval set.

Experiments employed two datasets containing fine-grained nugget-level judgments to rigorously evaluate LANCER’s performance. The researchers measured the quality of retrieval using nugget coverage metrics, specifically α-nDCG and overall information coverage, comparing LANCER against other state-of-the-art LLM-based reranking methods optimised for relevance. Oracle analysis, where the system was provided with ideal sub-questions, further revealed the critical role of accurate sub-question generation in achieving optimal performance. Results demonstrate that LANCER consistently achieves higher α-nDCG and information coverage scores, indicating a significant improvement in the ability to retrieve documents covering diverse informational facets. Furthermore, the study highlights the transparency offered by LANCER, as the generated sub-questions and associated answerability scores provide a clear audit trail of the retrieval process, allowing for improved system debugging and refinement.

LANCER improves report information coverage via reranking search

The research team developed LANCER, a novel Long-form Retrieval-Augmented Generation (RAG) system designed to enhance automated report generation. LANCER addresses the challenges of nuanced report requests requiring comprehensive information and citation-supported sentences. The system operates in three stages, focusing on reranking document candidates retrieved in a two-stage pipeline. Experimental results demonstrate LANCER’s effectiveness in improving nugget coverage, a key metric for evaluating the completeness of information in generated reports. Specifically, LANCER utilises answerability judgements to assess the relevance of documents to specific information needs within the report request.

The system then aggregates coverage-based scores, evaluating how well each document addresses those needs. This aggregation process results in a final retrieved context, which is then used by a report generator to synthesize the complete report. The core innovation lies in the system’s ability to identify and prioritize documents that contribute to a comprehensive overview of the requested information. The researchers measured performance using coverage-based evaluation, demonstrating that LANCER effectively improves the quality of the retrieved context, achieving a score of 0.4 on coverage-based aggregation and 0.13 on a related metric. Furthermore, the team highlights the importance of considering multiple information needs within a single report request, a characteristic that distinguishes this work from typical long-form RAG applications. By focusing on comprehensive information retrieval and citation support, LANCER represents a step forward in automated report generation technology.

LANCER boosts RAG via sub-question prediction and answer

Scientists have developed LANCER, a new large language model (LLM)-based reranking method designed to improve information coverage in long-form retrieval-augmented generation (RAG) systems. Unlike existing methods that primarily focus on relevance ranking, LANCER predicts the sub-questions an information need requires answering, identifies documents addressing those sub-questions, and then reranks documents to maximise coverage of relevant information nuggets. Empirical results demonstrate that LANCER enhances quality as measured by nugget coverage metrics, achieving higher normalised discounted cumulative gain and overall information coverage compared to other LLM-based reranking approaches. The research highlights the importance of sub-question prediction in achieving comprehensive information retrieval.

Analysis suggests that aligning LLM answerability judgements with human identification of key information nuggets is crucial, with lower predicted ratings potentially introducing noise. To address this, the authors propose incorporating a logit-trick to better integrate lower-rated predictions. This work establishes a promising direction for optimising nugget-coverage in long-form RAG tasks, offering a retrieval method suitable for scenarios demanding elaborate and comprehensive responses. The authors acknowledge that LLM answerability judgements can be unstable, particularly with lower ratings, which represents a limitation. Future research should focus on refining the proxy sub-question generation and improving the reliability of LLM answerability assessments. This ongoing work, supported by the Hybrid Intelligence Center and the Dutch Research Council, aims to further enhance nugget-coverage optimisation and contribute to more effective long-form RAG systems.

👉 More information
🗞 LANCER: LLM Reranking for Nugget Coverage
🧠 ArXiv: https://arxiv.org/abs/2601.22008