Large language models generate statements about their internal processes, prompting investigation into whether this constitutes introspection. Research analysing model self-reports identifies one instance – accurate inference of its own temperature parameter – as a minimal example of introspection, distinct from mere descriptive output and likely lacking conscious experience.
The capacity for self-awareness, traditionally considered a hallmark of consciousness, is increasingly being explored in artificial intelligence. Recent advances in large language models (LLMs) have yielded systems capable of generating remarkably human-like text, including statements about themselves – their processes, limitations, and even internal ‘states’. This prompts a fundamental question: can these self-reports be meaningfully equated with introspection, the human capacity for examining one’s own thoughts and feelings? Iulia M. Coms of Google DeepMind, alongside Murray Shanahan from Google DeepMind and Imperial College London, address this complex issue in their paper, “Does It Make Sense to Speak of Introspection in Large Language Models?” where they critically examine examples of self-reporting in LLMs, distinguishing between genuine inference and mere algorithmic mimicry.
Assessing Generative Model Behaviour: Temperature and the Prioritisation of Form
Recent research investigates the capacity of large language models (LLMs) to discern the ‘temperature’ parameter used during text generation. Temperature, in the context of LLMs, controls the randomness of the output; higher temperatures yield more diverse and potentially creative text, while lower temperatures produce more predictable and conservative responses. The study demonstrates that LLMs frequently, though not invariably, correctly identify whether a given text sample was generated using a high or low temperature setting.
However, the instances where LLMs misclassify these samples offer a more nuanced understanding of their internal processes. Researchers found that LLMs sometimes prioritise grammatical correctness and stylistic coherence over factual accuracy when evaluating text. This suggests a potential bias towards form over substance in their assessment criteria.
The difficulty LLMs exhibit in self-assessment – evaluating their own generated output – is particularly revealing. The process appears to induce a form of recursive analysis, where the model repeatedly processes the text, potentially amplifying minor inconsistencies or errors. This can lead to misclassification, even when the temperature setting is readily apparent to a human observer.
Furthermore, the study highlights instances where LLMs attempt to rationalise outputs that are, objectively, nonsensical. This behaviour indicates a disconnect between the model’s internal reasoning processes and the actual content it generates. The model appears to construct a narrative justification, even when the underlying information lacks coherence.
These findings underscore a fundamental difference between LLM reasoning and human cognition. While LLMs excel at pattern recognition and statistical prediction, they lack the contextual understanding and common-sense reasoning that underpin human evaluation. The research suggests that assessing an LLM’s ability to evaluate its own output provides a valuable window into the limitations of its cognitive architecture.
👉 More information
🗞 Does It Make Sense to Speak of Introspection in Large Language Models?
🧠 DOI: https://doi.org/10.48550/arXiv.2506.05068
