The ability to perceive and understand time is fundamental to human cognition, yet recent advances reveal that artificial intelligence may also exhibit surprisingly similar temporal reasoning skills. Lingyu Li from the Shanghai Artificial Intelligence Laboratory, alongside Yang Yao from The University of Hong Kong and Yixu Wang from the Shanghai Artificial Intelligence Laboratory, and colleagues, investigate how large language models process time, discovering that these systems spontaneously develop a subjective sense of temporal reference. The research demonstrates that these models adhere to established principles of human temporal cognition, such as the Weber-Fechner law, and utilise internal mechanisms, including specialised neurons and hierarchical data processing, to construct a non-linear representation of time. This finding is significant because it suggests that artificial intelligence can develop cognitive frameworks independent of explicit programming, potentially leading to unforeseen ways of ‘thinking’ and highlighting the need for new approaches to ensure AI alignment that focus on guiding the development of these internal representations
LLMs Model Subjective Time and Distance
This research investigates whether large language models (LLMs) exhibit a sense of subjective time and distance, mirroring human perception. The study explores if LLMs develop internal reference points when evaluating the similarity of years or numbers, suggesting an understanding beyond simple data processing. Researchers prompted LLMs to assess the similarity between pairs of years and numbers, then quantified these judgments by analysing the cosine similarity of embeddings. Cosine similarity, a measure of the angle between two vectors, provides a numerical representation of how closely related the LLM perceives the two inputs to be, with values closer to one indicating greater similarity. They compared the LLM’s responses to predictions based on several distance metrics, including a linear scale, string edit distance, which quantifies the minimum number of operations needed to transform one string into another, and a reference-log-linear distance, centered around an internal reference point. This reference point represents a year or number the model uses as a baseline for comparison, potentially indicating a subjective anchor within its internal representation of time or quantity.
A diverse range of LLMs, including Llama 3, Qwen, and models from OpenAI and Google, were tested, and a sliding window method identified the region of highest perceptual differentiation, pinpointing the subjective temporal reference point within each model. The sliding window technique systematically varied the comparison point to determine where the LLM exhibited the most significant change in its similarity judgements, effectively revealing its internal anchor for temporal or numerical comparison. The results demonstrate that LLMs perform well with a linear scale when judging the similarity of numbers, aligning with the expectation that numerical distance is often perceived linearly. However, the reference-log-linear distance significantly improves predictive accuracy when assessing years, particularly in larger models. This suggests LLMs aren’t simply applying a linear scale to years, but are developing an internal reference point for temporal judgment, indicating a non-linear perception of time with certain years acting as anchors for their evaluations. The study also found that the semantic distances between years within the training data are best explained by the reference-log-linear model, suggesting that models internalize a pre-existing non-linear temporal structure from their exposure to vast amounts of text. This implies that the model’s perception of time isn’t arbitrary, but is grounded in the statistical patterns present in the data it was trained on, reflecting the relative frequency and co-occurrence of different years in historical texts and narratives.
Furthermore, larger models exhibit a stronger tendency to adopt this reference-log-linear distance, indicating that the development of a subjective temporal scale may be an emergent property of model size and complexity. This finding is significant because it suggests that the capacity for non-linear temporal reasoning isn’t explicitly programmed into these models, but arises spontaneously as they scale in size and are exposed to more data. The increased capacity of larger models allows them to capture more subtle patterns and relationships within the training data, leading to the emergence of more complex cognitive abilities. This research provides compelling evidence for emergent cognitive abilities in LLMs. The development of a subjective temporal scale isn’t explicitly programmed, but arises from training on extensive text data, suggesting LLMs aren’t merely performing statistical pattern matching, but are developing internal representations that reflect a more nuanced understanding of the world, including concepts like time and distance, and how they construct world models for prediction and reasoning. The emergence of subjective scales could be a step towards creating AI systems that are more aligned with human cognition and can interact with the world in a more natural and intuitive way. Such systems could potentially exhibit more robust reasoning abilities and a greater capacity for common-sense understanding.
The study also implicitly raises questions about biases embedded in the training data, as the location of the subjective temporal reference point could reflect cultural or historical biases present in the corpus. For example, a model trained primarily on Western historical texts might exhibit a stronger emphasis on certain dates or periods, leading to a skewed perception of time. Identifying and mitigating these biases is crucial for ensuring that AI systems are fair and equitable. The study’s strengths lie in its rigorous methodology, employing multiple distance metrics, a diverse set of LLMs, and non-parametric analysis to strengthen the validity of the findings, providing a comprehensive understanding of the findings through quantitative analysis and qualitative interpretation. Non-parametric analysis, in this context, avoids making assumptions about the underlying distribution of the data, enhancing the robustness of the conclusions. Future research could further explore the nature of these emergent cognitive abilities and their implications for AI development, potentially investigating whether these subjective scales can be manipulated or refined to improve the performance of LLMs on tasks requiring temporal reasoning. In conclusion, this is a highly insightful and thought-provoking study that provides compelling evidence for emergent cognitive abilities in LLMs, with important implications for our understanding of AI and the potential development of more human-like and intelligent systems. The findings contribute to the growing body of evidence suggesting that LLMs are capable of more than just mimicking human language; they are developing internal representations of the world that allow them to reason, learn, and adapt in ways that were previously thought to be exclusive to biological intelligence.
👉 More information
🗞 The Other Mind: How Language Models Exhibit Human Temporal Cognition
🧠 DOI: https://doi.org/10.48550/arXiv.2507.15851
