Irregularly sampled time series data presents a significant challenge in critical care, yet applying Large Language Models (LLMs) to this complex information remains largely unstudied. Researchers Feixiang Zheng (The University of Melbourne), Yu Wu, and Cecilia Mascolo (University of Cambridge) et al. address this gap by systematically evaluating the core components of LLMs , the time series encoder and multimodal alignment , when applied to benchmark ICU datasets. Their work reveals that explicitly modelling data irregularity within the encoder is crucial for performance gains, exceeding improvements achieved through alignment strategies. This research is significant because it highlights both the potential and current limitations of LLMs for analysing irregular ICU time series, demonstrating that while promising, they currently require substantially longer training and struggle in data-scarce environments compared to established supervised models.
This breakthrough addresses a critical gap in current research, which largely focuses on regularly sampled data, by investigating the effectiveness of LLMs when confronted with the complexities of ICU monitoring, characterised by sporadic measurements, high rates of missing values, and asynchronous sampling. The research team established a systematic testbed to evaluate the impact of two key LLM components: the time series encoder and the multimodal alignment strategy, across benchmark ICU datasets and against established supervised and self-supervised baselines. Results reveal that the design of the time series encoder is demonstrably more critical than the alignment strategy for achieving high performance.
Specifically, encoders explicitly designed to model irregularity yielded an average AUPRC increase of 12.8% over standard Transformer architectures. While the multimodal alignment strategy proved less impactful, the best-performing, semantically rich, fusion-based approach still achieved a modest 2.9% improvement over cross-attention mechanisms. The study meticulously compared the performance of four state-of-the-art LLM-based methods, Time-LLM, S2IP, CALF, and FSCA, alongside traditional methods on the task of ICU mortality prediction using two benchmark irregular ICU datasets. This rigorous evaluation provides actionable insights for designing more effective and computationally viable LLM-based time series models for clinical applications.
However, the research also highlights current limitations; LLM-based methods require at least 10times longer training than the best-performing irregular supervised models, delivering only comparable performance. Furthermore, these models underperform in data-scarce, few-shot learning scenarios, indicating a need for further optimisation. The work systematically investigates the applicability and limitations of these key components in processing irregular time series, posing research questions regarding LLM generalisation, component impact, computational trade-offs, and few-shot learning effectiveness. This comprehensive empirical study offers a valuable contribution to the field, providing a nuanced understanding of the potential and challenges of leveraging LLMs for critical care data analysis.
Experiments show that the team successfully designed a framework where irregular time series are segmented, encoded into temporal embeddings, and aligned with textual embeddings for classification. The research establishes that while LLMs offer promising capabilities for handling complex ICU data, careful consideration must be given to encoder design and computational efficiency to realise their full potential. The code developed for this study is publicly available, facilitating further research and development in this rapidly evolving field, and is accessible at https://github. This research focused on two crucial elements for successful LLM-based time series modelling: the time series encoder and the multimodal alignment strategy. To rigorously assess their impact, the team established a systematic testbed utilising benchmark ICU datasets and comparing state-of-the-art LLM methods against strong supervised and self-supervised baselines. Experiments employed four specific LLM-based methods, Time-LLM, S2IP, CALF, and FSCA, alongside traditional techniques for ICU mortality prediction.
The study pioneered a comparative analysis of encoder designs, revealing that explicitly modelling irregularity yielded substantial performance gains, achieving an average AUPRC increase of 12.8% over a vanilla Transformer architecture. Researchers meticulously evaluated various encoders, quantifying their ability to handle the high rates of missing values characteristic of ICU data. Furthermore, the team assessed multimodal alignment strategies, discovering that a semantically rich, fusion-based approach delivered a modest 2.9% improvement over cross-attention mechanisms. This detailed comparison enabled precise quantification of the contribution of each component to overall model performance.
Crucially, the work highlighted a significant trade-off: LLM-based methods required at least 10times longer training durations than the best-performing irregular supervised models, despite achieving only comparable performance. The team measured training times precisely, demonstrating the computational cost associated with LLM implementation. Moreover, experiments revealed that LLMs underperformed in data-scarce, few-shot learning scenarios, indicating limitations in their ability to generalise from limited data. This systematic evaluation, conducted across two benchmark irregular ICU datasets, provides actionable insights into the promise and current limitations of LLMs for irregular time series analysis in critical care settings. The code developed during this research is publicly available, facilitating further investigation and reproducibility at https://github. com/mHealthUnimelb/LLMTS.
Irregularity-Modelling Encoders Boost ICU Data Performance significantly
Scientists achieved a 12.8% average AUPRC increase using encoders that explicitly model irregularity, surpassing the performance of vanilla Transformer models on benchmark ICU datasets. Experiments revealed that the encoder design is demonstrably more critical than the alignment strategy for effective modelling of these complex datasets. The team measured performance gains across various state-of-the-art LLM-based methods, comparing them against strong supervised and self-supervised baselines.
Results demonstrate that semantically rich, fusion-based alignment strategies achieved a modest 2.9% improvement over cross-attention techniques, highlighting the value of incorporating contextual information. However, the study also quantified a significant computational cost, finding that LLM-based methods require at least 10times longer training than the best-performing irregular supervised models to achieve comparable performance. Data shows that LLM-based methods underperform in data-scarce, few-shot learning settings, indicating a limitation in their ability to generalize with limited data availability. Researchers established a systematic testbed to rigorously assess the impact of these components, utilising two benchmark irregular ICU datasets for mortality prediction tasks.
Measurements confirm that while LLMs hold promise for irregular time series analysis, their current limitations necessitate further research into efficient training methodologies and data augmentation techniques. The breakthrough delivers valuable insights into the trade-offs between computational complexity and performance gains when applying LLMs to critical care data. Tests prove that explicitly modelling irregularity within the encoder is paramount for achieving substantial improvements in AUPRC. This work highlights the need for specialised encoder designs tailored to the unique characteristics of irregular time series, paving the way for more robust and accurate patient monitoring systems. The code developed for this study is publicly available at https://github. This research systematically evaluated key components of LLMs, the time series encoder and multimodal alignment strategy, using benchmark ICU datasets and comparing them against established supervised and self-supervised methods. Results demonstrate that the encoder design significantly impacts performance, with encoders explicitly modelling irregularity achieving substantial gains over standard Transformers. While alignment strategies also contribute, their effect is less pronounced, with fusion-based approaches showing modest improvements over cross-attention mechanisms.
However, researchers found that LLM-based methods require considerably longer training times, at least ten times more, than the best-performing irregular supervised models, yet deliver comparable performance. Furthermore, these models underperform when data is limited, suggesting a potential limitation in adapting LLMs to sparse, irregular time series. The study highlights the promise of LLMs for ICU data analysis, but also acknowledges their current computational demands and challenges in generalizing with limited data. Authors note that LLMs, pretrained on extensive text corpora, may struggle to effectively learn from limited, irregular time series data, unlike more efficient, domain-specific models. Future work should focus on developing more efficient architectures that balance performance gains with computational costs, particularly for resource-constrained clinical environments.
👉 More information
🗞 Rethinking Large Language Models For Irregular Time Series Classification In Critical Care
🧠 ArXiv: https://arxiv.org/abs/2601.16516
