Dual-stage Patient Chart Summarization on Embedded Devices Enables Offline Clinical Insights for Emergency Physicians

Emergency physicians frequently face a challenge in quickly extracting vital information from the increasingly extensive and unstructured data within electronic health records. Jiajun Wu, Swaleh Zaidi from the University of Calgary, Braden Teitge from Alberta Health Services, and colleagues address this problem with a novel system for summarising patient charts directly on portable devices. Their research presents a two-stage approach that retrieves relevant information and then generates both a concise list of critical findings and a detailed narrative summary, all while maintaining patient privacy through offline operation. This achievement overcomes the limitations of cloud-based solutions and demonstrates the feasibility of rapidly producing useful summaries, in under 30 seconds, using small language models running on embedded devices, offering a potentially transformative tool for time-critical medical decision-making.

Small Language Models Summarize Health Records

This research investigates the potential of using Small Language Models (SLMs) on edge devices, specifically NVIDIA Jetson Nano units, to summarize Electronic Health Records (EHRs). Current Large Language Models are too large for real-time clinical use, prompting exploration of whether SLMs can provide acceptable performance without cloud connectivity, preserving patient privacy and reducing delays in information access. The core aim is to determine if these smaller models can effectively condense complex medical histories for clinicians. SLMs are Large Language Models reduced in size for deployment on devices with limited resources.

Edge computing processes data locally, minimizing latency, enhancing privacy, and allowing operation without internet access. Efficiently summarizing EHRs is crucial for streamlining clinical workflows and improving patient care, and this research leverages these technologies to address a critical need in healthcare. The researchers experimented with various SLMs, focusing on reducing model size while maintaining performance. These models were trained on the MIMIC-IV-Note dataset, a publicly available collection of de-identified clinical notes, and deployed on the NVIDIA Jetson Nano, requiring optimization to accommodate the device’s limited resources.

The study also investigated advanced prompting techniques, including Chain-of-Thought and Plan-and-Solve, to improve summary quality. Human evaluation by medical experts, assessing relevance, coherence, and overall quality, was used alongside automatic metrics like ROUGE and BERTScore. The study demonstrates that deploying SLMs on edge devices for EHR summarization is feasible. Advanced prompting techniques significantly improved summary quality, although a trade-off exists between model size, performance, and computational resources. Smaller models are faster and require less memory, but may produce less accurate summaries.

Human evaluation remains crucial, as automatic metrics alone are insufficient to assess clinical utility. The results suggest that adapted SLMs can approach the performance of medical experts in clinical text summarization. This research suggests that SLMs deployed on edge devices have the potential to revolutionize EHR summarization in clinical settings. This could lead to faster access to summarized patient information, improved privacy through on-site data processing, increased efficiency in clinical workflows, enhanced decision-making, and accessibility in resource-constrained environments. Future research should focus on further optimizing SLMs for edge deployment, exploring new prompting techniques, conducting larger-scale clinical trials, integrating the system with existing EHR systems, and addressing potential limitations such as inaccurate or biased summaries. This work represents a promising step towards realizing the potential of edge AI for healthcare.

On-Device Summarization of Emergency Department Records

This research presents a novel two-stage summarization system designed for offline clinical use in emergency departments, prioritizing patient privacy and rapid information access. The system utilizes a dual-device architecture, employing two Jetson Nano units, to process extensive EHR data directly on-site. The first device retrieves locally stored EHRs and identifies relevant sections in response to clinician queries, segmenting long notes into coherent sections for targeted searches. Following retrieval, the second device generates a structured summary using a locally hosted small language model (SLM).

To identify the optimal SLM, the team benchmarked six open-source models, all under 7 billion parameters, evaluating their performance in generating concise and accurate summaries. The summarization output comprises a fixed-format list of critical findings and a context-specific narrative tailored to the clinician’s specific query, providing both essential information and detailed case-specific insights. To rigorously assess summary quality, the researchers developed FA, a risk-weighted, evidence-linked factuality metric. This metric automatically verifies each fact within the generated summary against the original EHR data, leveraging an LLM-as-Judge framework to ensure accuracy.

Importantly, the evaluation methodology does not require reference summaries, providing a robust and independent measure of factual correctness. Experiments conducted on both the MIMIC-IV dataset and de-identified real EHRs demonstrate the system’s ability to produce useful summaries in under 30 seconds, highlighting its feasibility for practical deployment in busy emergency department settings. The entire system operates on affordable IoT hardware, further enhancing its potential for widespread adoption.

Offline EHR Summarization for Emergency Medicine

This work presents a breakthrough in emergency medical care through the development of a fully offline, edge-resident system for summarizing electronic health records (EHRs). Recognizing the challenges emergency physicians face when quickly reviewing extensive patient histories, the team engineered a solution that prioritizes patient privacy and operates independently of internet connectivity. The system utilizes two NVIDIA Jetson Orin Nano boards, each dedicated to a specific stage of the summarization process, achieving a lightweight and efficient workflow. The first Nano board retrieves and prepares relevant patient information from the EHR, while the second Nano board hosts a small language model (SLM) that generates the summary.

This dual-device architecture substantially reduces processing time compared to a single-device approach, enabling the system to produce useful summaries in under 30 seconds. The team benchmarked six open-source SLMs under 7 billion parameters to identify viable models for this task, optimizing performance within the constraints of the embedded hardware. Beyond the system architecture, the research incorporated feedback from emergency physicians, resulting in targeted summaries tailored for the unique needs of the emergency department. The system delivers two key outputs: a fixed-format list of critical findings and a context-specific narrative focused on the clinician’s query. Automated evaluations, utilizing an LLM-as-Judge mechanism, assess summary quality in terms of factual accuracy, completeness, and clarity, ensuring clinical standards are met. This innovative approach addresses critical gaps in existing clinical summarization systems, which often rely on cloud-based models and raise privacy concerns.

Offline Clinical Summarization on Edge Devices

This research presents a novel, two-stage system for summarizing electronic health records directly on low-power devices, enabling offline clinical summarization while maintaining patient privacy. The system combines information retrieval with a small language model to generate both a structured list of critical findings and a detailed narrative response to specific clinical queries. Results demonstrate the system effectively produces useful summaries in under 30 seconds, showcasing the potential of deploying language models on edge devices without relying on cloud connectivity. The team also introduced a new method for evaluating the quality of these summaries, assessing factual accuracy, completeness, and clarity without requiring pre-defined gold standard.

👉 More information
🗞 Dual-stage and Lightweight Patient Chart Summarization for Emergency Physicians
🧠 ArXiv: https://arxiv.org/abs/2510.06263

Avatar

Quantum Mechanic

Latest Posts by Quantum Mechanic:

Quantum Processor and 152,064 Classical Nodes Compute Electronic Structure at Full Scale

Quantum Processor and 152,064 Classical Nodes Compute Electronic Structure at Full Scale

November 6, 2025
Multiplexed Double-Transmon Coupler Scheme Achieves 96% Fidelity and Reduces Wiring Complexity in Quantum Processors

Multiplexed Double-Transmon Coupler Scheme Achieves 96% Fidelity and Reduces Wiring Complexity in Quantum Processors

November 6, 2025
Current Cross-Correlation Spectroscopy Extracts Electron Traversal Times in Majorana Bound States Systems

Current Cross-Correlation Spectroscopy Extracts Electron Traversal Times in Majorana Bound States Systems

November 6, 2025