Pathological diagnosis, a cornerstone of modern medicine, increasingly benefits from advances in artificial intelligence, yet current systems often mimic a single, static assessment of complex whole-slide images. Shengyi Hua, Jianfeng Wu, and Tianle Shen, along with colleagues at their respective institutions, address this limitation by introducing PathFound, a novel agentic multimodal model that actively seeks evidence to refine diagnoses. This innovative approach mirrors the iterative process employed by clinicians, where initial observations prompt further investigation and targeted examination of tissue samples, and represents a significant step towards more accurate and reliable computational pathology. The team demonstrates that PathFound consistently improves diagnostic accuracy across multiple large datasets, and excels at identifying subtle, clinically relevant details, promising a future where artificial intelligence enhances, rather than simply replicates, the expertise of pathologists.
Pathology Foundation Models for Cancer Diagnosis
Recent research demonstrates a significant shift towards building large, general-purpose foundation models for pathology, capable of adapting to numerous tasks. These models are increasingly multimodal, combining image data with textual information like reports and annotations to enhance performance, and agent-based approaches are emerging, aiming to replicate the reasoning process of pathologists. Several teams have developed foundation models trained on extensive real-world clinical data, including models designed for rare cancer detection and comprehensive clinical task coverage. These advancements represent a major trend in the field, with models like CPath-Omni and Virchow2 pushing the boundaries of whole-slide image analysis.
Alongside foundation models, researchers are exploring AI agents to simulate the diagnostic process, employing reinforcement learning to train systems that mimic pathologist logic. These agent-based systems, such as CPathAgent and Patho-r1, utilize visual chain-of-thought reasoning to improve diagnostic accuracy. Further research focuses on specific image analysis techniques, like rotation-equivariant CNNs, and captioning/text generation to create textual descriptions of pathology images, exemplified by Pathgen-1.6m, which generates image-text pairs through multi-agent collaboration.
PathFound, An Iterative Diagnostic Framework
The research team pioneered PathFound, an agentic multimodal framework that mimics the iterative process of pathological diagnosis, moving beyond traditional single-pass image analysis. This system addresses a key limitation of current visual representation learning, which often processes whole-slide images only once, unlike clinical practice where diagnoses are refined through repeated observations and requests for further examination. PathFound operates as a dynamic loop, integrating perception, reasoning, and targeted information retrieval across three stages, exploration, execution, and exploitation, to simulate a pathologist’s diagnostic workflow. The methodology centers on a three-module system, starting with a slide highlighter that identifies regions of interest within the whole-slide image, enabling focused analysis.
The vision interpreter then analyzes these highlighted regions, extracting visual features and contextual information, while the diagnostic reasoner synthesizes this information with existing knowledge to formulate a diagnosis and guide further exploration. Experiments using a flexible protocol demonstrate the system’s adaptability, and training across large multimodal datasets consistently improves diagnostic accuracy. This innovative framework achieves state-of-the-art performance, exhibiting a strong capacity to identify subtle details previously challenging for automated systems.
PathFound, An Iterative Diagnostic Reasoning System
Scientists have developed PathFound, an agentic multimodal system that fundamentally alters how pathological diagnosis is approached, moving beyond static image analysis to an iterative, evidence-seeking process. This work mirrors the clinical workflow of pathologists, who refine diagnoses through repeated slide observations and requests for further examination. The team designed PathFound to proactively acquire information and refine its conclusions, progressing through stages of initial diagnosis, targeted evidence gathering, and final decision-making. Experiments reveal that adopting this evidence-seeking strategy consistently improves diagnostic accuracy across several large multimodal datasets.
PathFound integrates a slide highlighter to distill large whole-slide images into representative regions of interest, a vision interpreter to translate these regions into textual observations, and a diagnostic reasoner trained with reinforcement learning. This diagnostic reasoner orchestrates the entire process, managing evidence acquisition and user interaction, and mirroring the coarse-to-fine reasoning of experienced pathologists. The breakthrough delivers state-of-the-art diagnostic performance, and measurements confirm a strong ability to discover subtle details, including nuanced nuclear features and localized invasion patterns.
Agentic Reasoning Improves Diagnostic Accuracy
PathFound represents a significant advance in computational pathology, introducing an agentic multimodal model that mimics the iterative reasoning process of pathologists. Unlike existing systems that produce a single diagnosis from a whole-slide image, this model actively seeks evidence, refines hypotheses, and revisits areas of interest to improve diagnostic accuracy. The system integrates three key components: a slide highlighter to identify relevant regions, a vision interpreter to translate these regions into textual observations, and a diagnostic reasoner trained with reinforcement learning, simulating the stages of clinical diagnosis. The research team demonstrates that this evidence-seeking approach consistently improves diagnostic performance across multiple clinical scenarios, achieving state-of-the-art results in identifying subtle details crucial for accurate diagnosis, such as nuclear features and local invasions. While the model exhibits strong performance, further validation with diverse clinical samples is necessary, and future work could explore the integration of additional data types and refinement of the reinforcement learning framework to enhance the model’s reasoning abilities.
👉 More information
🗞 PathFound: An Agentic Multimodal Model Activating Evidence-seeking Pathological Diagnosis
🧠 ArXiv: https://arxiv.org/abs/2512.23545
