Accessing relevant patient information within Electronic Health Records (EHRs) is becoming ever more crucial for effective clinical decision-making, but current question-answering systems often fall short when applied to real-world data. Lingfei Qian, Mauro Giuffre, and Yan Wang, from Beth Israel Deaconess Medical Center, alongside Huan He, Qianqian Xie, and Xuguang Ai, address this challenge with EHRNavigator, a novel multi-agent system designed for patient-level clinical question answering. This research is significant because it moves beyond standard benchmark datasets to evaluate performance directly on heterogeneous hospital data, incorporating complex factors like varying data schemas and the need for temporal reasoning. Through rigorous testing and validation by clinicians, EHRNavigator achieves 86% accuracy in real-world scenarios, demonstrating a substantial step towards practical and efficient EHR data access for improved patient care.
EHR Data Navigation for Patient Question Answering
Question answering (QA) systems are typically assessed using standard datasets, which limits how useful they are in real-world medical settings. The system employs artificial intelligence agents to navigate and synthesise information from both structured data, such as lab results and medication orders, and unstructured data like clinical notes and medical images. The performance of EHRNavigator was evaluated using both publicly available benchmark datasets and data from a real hospital environment, simulating the challenges of varied data formats and the need to understand information over time.
This included assessing its ability to integrate evidence from multiple sources within the EHR. Quantitative analysis and review by clinicians confirmed the system’s ability to generalise effectively, achieving 86% accuracy in real-world scenarios while maintaining response times suitable for clinical use. A key problem EHRNavigator addresses is the time-consuming and potentially error-prone process of manual chart review. Clinicians currently spend significant time, an average of 16 minutes and 14 seconds per patient encounter, with a third of that time spent reviewing charts, extracting relevant information from EHRs.
The system aims to improve efficiency by intelligently querying EHRs and providing clinicians with concise, time-sensitive information for better decision-making, particularly in complex cases like monitoring patients on antibiotics. The research highlights the challenges posed by the volume and fragmentation of EHR data, citing that patients in the MIMIC-III intensive care database have an average of 45.1 clinical notes totalling over 20,000 words. EHRNavigator offers a solution by providing a robust and adaptive system capable of handling this large amount of heterogeneous data, bridging the gap between benchmark testing and practical clinical deployment for EHR question answering.
EHRNavigator Agentic System for Patient-Level Question Answering
The research team developed EHRNavigator, a multi-agent framework designed to address limitations in current Electronic Health Record (EHR) question-answering systems. This innovative system moves beyond reliance on benchmark datasets by performing patient-level question answering across both structured and unstructured EHR data, mirroring the diagnostic process of clinicians. The core of EHRNavigator lies in its agentic architecture, which decomposes complex queries into modular sub-tasks and dynamically invokes specialized tools as needed to synthesize outputs. Scientists engineered the system to systematically synthesize findings from disparate tables and clinical narratives, generating evidence-backed responses alongside transparent reasoning pathways.
This approach allows clinicians to trace insights directly to their source data, enhancing trust and interpretability. Crucially, EHRNavigator operates across diverse database structures without requiring schema-specific training, ensuring interoperability between institutions with differing data architectures. The team demonstrated this capability by dynamically mapping clinical concepts to institution-specific schemas, such as MIMIC-III and OMOP CDM, without manual retraining. Experiments employed four datasets, EHRSQL, EHRNoteQA, DrugEHRQA, and YNHHQA, to systematically evaluate the system’s performance across various clinical question types.
The study pioneered the evaluation of an EHR-QA system specifically for longitudinal trajectory reasoning using real-world clinical data, moving beyond typical point-in-time queries. Researchers assessed performance against vanilla Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) baselines, measuring execution accuracy and generative semantic alignment using metrics like ROUGE-L for EHRNoteQA. The work achieved 86% accuracy on real-world cases, while maintaining clinically acceptable response times, and importantly, rigorously analyzed implementation factors like system latency in a live hospital environment. The research addresses a critical need for timely access to patient information, moving beyond evaluations limited to benchmark datasets and focusing on real-world clinical application. The team measured performance using both public benchmarks and data from a hospital setting, specifically focusing on diverse data schemas, the need for temporal reasoning, and the integration of multiple data sources. Experiments revealed that EHRNavigator achieved 86% accuracy when answering real-world clinical questions.
Crucially, the system maintained clinically acceptable response times, demonstrating its potential for immediate use within a hospital environment. The work details a system capable of synthesizing information from both structured data, such as laboratory results and medication orders, and unstructured data like free-text clinical notes. Data shows that patients in the MIMIC-III intensive care database average 45.1 clinical notes, totaling over 20,000 words, highlighting the volume of data clinicians currently manage. EHRNavigator employs a network of specialized agents to decompose complex queries into manageable tasks, invoking tools as needed to synthesize outputs.
This agentic architecture allows the system to autonomously reason across structured and unstructured data, mirroring a clinician’s diagnostic process. The framework effectively bridges the gap between benchmark evaluation and clinical deployment, offering a robust and adaptive solution for EHR question answering. Measurements confirm the system’s ability to handle temporally grounded questions, such as tracking changes in a patient’s white blood cell count over time, a capability often overlooked in existing systems. The breakthrough delivers a system that doesn’t rely on predefined question templates or schema-specific fine-tuning, enhancing its flexibility and generalizability across different hospital databases. Tests prove that EHRNavigator can integrate information from multiple data sources, overcoming the limitations of systems that focus on only structured data or unstructured notes. The research highlights the importance of evaluating system latency, a critical factor for bedside use, and demonstrates that EHRNavigator operates efficiently within the constraints of a live clinical setting.
EHRNavigator represents a significant advance in clinical question answering systems by moving beyond benchmark datasets to demonstrate performance within realistic hospital settings. The multi-agent framework successfully integrates and synthesises information from heterogeneous electronic health records, achieving 86% accuracy on real-world clinical questions while maintaining acceptable response times. This capability addresses a critical need for clinicians, automating the time-consuming process of reconstructing patient histories and synthesising information across multiple visits, a task that current systems often fail to address effectively. The research acknowledges limitations in handling complex, concept-driven queries and the current single-round querying paradigm. Future work will focus on incorporating explicit temporal reasoning, visit-context disambiguation, and integration with medical ontologies to improve performance on more nuanced clinical questions. Development of a multi-turn dialogue capability is also planned, aiming to replicate the iterative reasoning process employed by physicians and further enhance the system’s clinical utility.
👉 More information
🗞 EHRNavigator: A Multi-Agent System for Patient-Level Clinical Question Answering over Heterogeneous Electronic Health Records
🧠 ArXiv: https://arxiv.org/abs/2601.10020
