Researchers are increasingly investigating whether vision foundation models capture underlying biological processes, not just static classifications. Pritika Vig from the Massachusetts Institute of Technology and the Dana-Farber Cancer Institute, alongside Ren-Chin Wu and William Lotter, explored this question in the context of computational pathology, hypothesising that models encoding continuous disease progression would exhibit improved generalisation and facilitate quantitative analysis of disease transitions. Their work utilises diffusion pseudotime , a technique borrowed from single-cell transcriptomics , to assess whether foundation models organise disease states along meaningful progression pathways within their internal representations. The team’s findings, across four cancer types and six models, reveal a significant ability to recover disease trajectories, suggesting these models implicitly learn continuous processes from static images and offering a novel metric for evaluating representation quality beyond simple performance benchmarks.
Disease progression captured in vision model representations
Scientists have demonstrated that vision Foundation models, trained on sampled images, can implicitly represent continuous disease processes from static observations. This breakthrough research, focused on computational pathology, probes whether these models’ latent representations capture the underlying biological progression of disease, potentially improving generalisation and enabling detailed quantitative analysis of disease transitions. Researchers employed diffusion pseudotime, a technique originally developed for single-cell transcriptomics, to investigate if foundation models organise disease states along coherent progression pathways within their representation space. The study involved analysing four distinct cancer progressions using six different models, revealing that all pathology-specific models successfully recovered trajectory orderings significantly exceeding baseline expectations, with vision-only models achieving the highest fidelities, specifically a τ value greater than 0.78 on CRC-Serrated data.
The team achieved a strong correlation between model rankings based on trajectory fidelity on reference diseases and their few-shot classification performance on previously unseen diseases, with a correlation coefficient of ρ = 0.92. This suggests that the ability to accurately model disease progression serves as a valuable proxy for a model’s capacity to generalise to new clinical scenarios. Exploratory analysis further revealed that cell-type composition varies smoothly along the inferred trajectories, aligning with established knowledge of stromal remodeling processes during disease progression. These findings establish that vision foundation models are capable of learning to represent continuous biological processes from independent, static images, offering a complementary metric for evaluating representation quality beyond traditional downstream performance assessments.
This work introduces a novel framework for assessing the biological relevance of foundation model representations, moving beyond simple classification accuracy. By adapting diffusion pseudotime analysis, the researchers effectively translated a technique from genomics to the realm of visual pathology, providing a new lens through which to understand how these models ‘see’ disease. The study unveils that trajectory fidelity, a measure of how well a model captures the order of disease progression, is a strong predictor of performance on unseen data, suggesting that models which implicitly understand the dynamics of disease are more robust and adaptable. While demonstrated in the context of pathology, this framework possesses broad applicability to other domains where continuous processes are observed through discrete snapshots, opening avenues for future research in diverse fields.
Diffusion Pseudotime Maps Disease Progression in Models
Scientists investigated whether vision foundation models capture continuous disease progression within their latent representations. The study pioneered the application of diffusion pseudotime (DPT), a technique originally developed for single-cell transcriptomics, to analyse the organisation of disease states within the representation space of these models. Researchers embedded histopathology image patches using six different foundation models and then employed DPT to compute a pseudotime coordinate for each patch, effectively inferring a trajectory through disease progression. This approach enabled the team to quantify how well the inferred pseudotime ordering aligned with known ordinal ground-truth labels representing disease stages.
The experimental setup involved four distinct cancer progressions, and the team assessed trajectory fidelity by comparing the DPT-inferred ordering with established disease progression sequences. Specifically, the study constructed a nearest-neighbour graph from the embedded image patches, then calculated transition probabilities using random walks to generate the pseudotime coordinate. Across all pathology-specific models tested, the research consistently demonstrated that DPT successfully recovered trajectory orderings significantly exceeding null baselines, with vision-only models achieving the highest fidelities, registering a τ value greater than 0.78 on CRC-Serrated data. Furthermore, the work revealed a strong correlation between model rankings based on trajectory fidelity and their performance on few-shot classification tasks applied to held-out diseases, with a correlation coefficient of ρ = 0.92.
Exploratory analysis then examined cell-type composition along the inferred trajectories, revealing smooth variations consistent with known stromal remodeling patterns. This detailed analysis demonstrated that vision foundation models can implicitly learn to represent continuous processes from static observations, and that trajectory fidelity serves as a valuable, complementary metric for evaluating representation quality beyond traditional downstream performance measures. The methodology developed in this study offers a novel framework applicable to other domains where continuous processes are observed through discrete snapshots.
Disease progression recovered via vision models offers new
Scientists have demonstrated that vision foundation models can implicitly represent continuous biological processes from static images. Researchers probed whether these models organise disease states along coherent progression directions within their representation space, utilising diffusion pseudotime, a technique originally developed for single-cell transcriptomics. Across four cancer progressions and six models, the study consistently found that pathology-specific models accurately recover trajectory orderings, significantly exceeding expected random baselines. Notably, vision-only models exhibited the highest fidelity in representing colorectal cancer progression, and model rankings based on trajectory fidelity correlated strongly with classification performance on independent datasets.
Exploratory analysis revealed that cell-type composition changes smoothly along these inferred trajectories, aligning with established understanding of stromal remodelling during disease progression. This suggests that the models are not merely classifying discrete disease states, but are capturing underlying continuous changes. The authors acknowledge that their study focuses on morphological cancer progressions and may not generalise to more complex branching trajectories or those with multiple causes. Future work should focus on assessing generalisation across diverse datasets and patient populations.
These findings demonstrate that vision foundation models possess an inherent capacity to model continuous biological structure beyond simple classification tasks. This capability opens avenues for analysing disease progression at a finer granularity than currently possible, potentially enabling earlier detection of at-risk lesions and more tailored treatment strategies. The framework developed could also be extended to other vision domains where continuous processes are observed through static snapshots, offering a method for evaluating the dynamic information captured within learned representations.
👉 More information
🗞 Do Pathology Foundation Models Encode Disease Progression? A Pseudotime Analysis of Visual Representations
🧠 ArXiv: https://arxiv.org/abs/2601.21334
