Researchers are tackling the limitations of current large language models in healthcare by moving beyond simple text summarisation of patient records. Irsyad Adam, Zekai Chen, and David Laprade, from Standard Model Biomedicine, alongside colleagues including Shaun Porwal, David Laub, and Erik Reinertsen, present a new training paradigm that views patients as dynamic systems rather than static documents. Their work introduces SMB-Structure, a world model for structured electronic health records, which forces models to simulate disease trajectories and encode underlying dynamics, rather than merely predicting the next token in a sequence. Validated on substantial oncology and pulmonary embolism cohorts, this approach yields embeddings that capture crucial disease progression information currently missed by standard autoregressive models, potentially unlocking improved performance on complex, heterogeneous clinical tasks and offering a significant step towards more nuanced and predictive healthcare AI.
The research addresses a fundamental limitation of current large language models (LLMs) trained with next-word prediction, which treat patients as information to be summarised instead of evolving systems. This innovative approach introduces a world model for structured electronic health records (EHR) that combines supervised fine-tuning (SFT) with a joint-embedding prediction architecture (JEPA), enabling the simulation of disease progression over time. SFT reconstructs future patient states in token space, while JEPA predicts these futures directly in latent space from initial patient representations, compelling the encoding of trajectory dynamics before the next state is observed.
The team achieved this by forcing the model to predict future patient states in a latent space, rather than reconstructing them from input data, thereby prioritising the capture of abstract dynamics. This differs from conventional autoregressive models where trajectory dynamics can be deferred until decoding, as JEPA requires the encoder to predict the future embedding before observing the future state. Experiments were conducted using two large-scale cohorts, encompassing 23,319 oncology patients with over 323,000 patient-years of data from Memorial Sloan Kettering, and 19,402 pulmonary embolism patients from the INSPECT cohort. The study validates the approach not at isolated time points, but along the entire disease trajectory, demonstrating the ability to capture dynamics that are inaccessible to autoregressive baselines.
This breakthrough reveals that SMB-Structure learns embeddings capable of capturing disease dynamics, enabling competitive performance on complex tasks characterised by high patient heterogeneity. The research establishes a new benchmark for modelling longitudinal EHR data, moving beyond simple sequence prediction to a more nuanced understanding of patient evolution. Model weights for the SMB-v1-1.7B-Structure are publicly available on Hugging Face, facilitating further research and development in the field. The work opens new avenues for clinical decision support systems that can anticipate the future course of disease, rather than simply documenting its present state.
Simulating patient trajectories with SMB-Structure and JEPA
Scientists developed SMB-Structure, a novel world model for structured electronic health records (EHR) that integrates a joint-embedding prediction architecture (JEPA) with next-token prediction via supervised fine-tuning (SFT). The study pioneers a method to simulate patient trajectories, moving beyond treating patients as simple documents to be summarised, and instead modelling them as dynamical systems. Researchers formalised a patient’s clinical trajectory as a sequence of states, each encoded as medical tokens encompassing demographics, conditions, measurements, procedures, medications, and clinical notes. Given a context and future state, the team sought an encoder capable of capturing the dynamics governing the transition between them.
Experiments employed a unique training paradigm requiring the encoder to predict future embeddings before observing future tokens, addressing the limitations of standard autoregressive training which can defer reasoning to decoding time. Specifically, the team implemented the equation hi = gφ(fθ(Mask(x ⊕y, M))) ≈f θ(x ⊕y)i ∀i ∈M, where gφ is a predictor network, fθ is a momentum encoder, and M denotes masked positions. SMB-Structure extends a pretrained large language model (LLM) backbone with three key components: domain-specific clinical tokens, a bottleneck predictor for latent-space prediction, and a momentum encoder for stable targets. Patient records were serialised using delimiter tokens to demarcate clinical categories, added to the vocabulary, and their embeddings learned during fine-tuning, explicitly exposing clinical field boundaries to the LLM.
The predictor refines encoder representations by projecting them to a lower dimension via a bottleneck architecture, encouraging abstract predictive features. Masked positions were replaced with a learnable token, then processed through a series of layers defined by Hbottleneck = HWdown and H = LayerNorm(Transformer(Hbottleneck) Wup). A momentum encoder, updated with a high momentum of τ = 0.996, maintains an exponential moving average copy of the encoder, providing stable target embeddings. The SFT objective grounds the model in clinical semantics through next-token prediction, calculated as LSFT = −1m m X t=1 log pθ(yt|y Furthermore, the JEPA objective forces trajectory dynamics into the representation by predicting in latent space before observing future states, masking 50% of target tokens exclusively in the continuation. The prediction target is the latent embedding produced by the momentum encoder, removing incentives to model surface-form statistics and encouraging state-level abstractions. This asymmetric online/target design prevents degenerate fixed points, ensuring representations remain predictable and meaningful, validated across cohorts of 23,319 oncology patients and 19,402 pulmonary embolism patients.
SMB-Structure captures dynamic disease trajectories effectively
Scientists have developed SMB-Structure, a new approach to modelling electronic health records (EHR) that moves beyond treating patients as static documents and instead simulates their dynamic evolution as a system. The research team validated their work across two large-scale cohorts, encompassing 23,319 oncology patients with over 323,000 patient-years of data from Memorial Sloan Kettering, and 19,402 pulmonary embolism patients from the INSPECT database. Experiments revealed that SMB-Structure learns embeddings capturing disease dynamics not recoverable by standard autoregressive baselines, demonstrating improved performance on complex tasks with high patient heterogeneity. This breakthrough delivers a paradigm shift in how patient trajectories are understood and modelled.
The core of SMB-Structure lies in a joint-embedding prediction architecture (JEPA) grounded with next-token prediction through supervised fine-tuning. Researchers forced the model to predict future patient states in both token and latent space, compelling the encoding of trajectory dynamics before the next state is observed. Specifically, the encoder must predict the future embedding before observing the future state, ensuring dynamics are embedded within the representation itself. Tests prove that this approach captures temporal relationships crucial for understanding disease progression. Data shows that the model architecture comprises a pretrained large language model backbone extended with domain-specific clinical tokens, a bottleneck predictor for latent-space prediction, and a momentum encoder for stable targets.
The team evaluated the model not at single timepoints, but along complete disease trajectories, providing a more holistic assessment of its predictive capabilities. Measurements confirm that SMB-Structure achieves competitive performance on tasks requiring trajectory-level reasoning, essential when dealing with significant patient variability. Results demonstrate that the training paradigm learns embeddings that capture disease dynamics, enabling the model to accurately forecast future patient states. The work addresses a limitation of existing approaches which primarily focus on reconstruction, predicting the next token or event, rather than explicitly encoding how patient states evolve. Scientists recorded that the model weights are publicly available on Hugging Face at https://huggingface.co/standardmodelbio/SMB-v1-1.7B-Structure, facilitating further research and development in the field. This advancement has the potential to significantly improve clinical decision-making and patient care.
Predicting disease trajectories via latent embeddings requires longitudinal
Scientists have developed SMB-Structure, a new training paradigm for longitudinal electronic health records (EHR) that combines supervised fine-tuning with a Joint-Embedding Predictive Architecture. This approach aims to bridge the gap between reconstructing clinical documentation and simulating a patient’s disease trajectory over time. By predicting future patient states in latent space before they are observed, the model is encouraged to capture the underlying dynamics of disease progression. Researchers validated SMB-Structure using two large cohorts, encompassing over 40,000 patients with oncology and pulmonary embolism diagnoses.
Evaluation through linear probes at various points in the disease course demonstrated that the training paradigm learns embeddings capable of capturing disease dynamics that are not readily apparent in standard autoregressive models. This resulted in competitive performance on complex tasks involving patients with diverse medical histories. The authors acknowledge limitations including the computational cost of dual forward passes and the current restriction of evaluation to linear probing techniques. Future work will focus on extending the framework to incorporate intervention-conditioned world models, potentially enabling counterfactual reasoning and treatment optimisation.
This work establishes a foundation for advanced clinical AI by demonstrating the importance of modelling patient trajectories as dynamical systems rather than static documents. The findings suggest that separating semantic understanding from dynamical modelling yields superior representations of patient histories. While acknowledging potential risks related to bias and overreliance on model predictions, the researchers emphasise the need for careful validation, fairness audits, and physician oversight before deployment.
👉 More information
🗞 The Patient is not a Moving Document: A World Model Training Paradigm for Longitudinal EHR
🧠 ArXiv: https://arxiv.org/abs/2601.22128
