The ability of large language models (LLMs) to express confidence alongside their outputs is increasingly used to build user trust, yet these models frequently demonstrate overconfidence that does not correlate with actual accuracy. Yuxi Xia, Loris Schoenegger, and Benjamin Roth, from the Faculty of Computer Science at the University of Vienna, address this critical issue with their research into the origins of this verbalized confidence. They introduce TracVC, a novel method utilising information retrieval and influence estimation to pinpoint the training data that informs an LLM’s stated confidence. Through evaluation on OLMo and Llama, the researchers propose a new metric, content groundness, and reveal that models like OLMo2-13B often base confidence on superficial linguistic cues rather than relevant content, highlighting a fundamental flaw in current training practices. This work offers crucial insights for developing LLMs that not only sound confident, but also express justified and reliable assessments of their own outputs.
Tracing LLM Confidence to Training Data
Scientists demonstrate a novel approach to understanding the origins of confidence expressed by large language models (LLMs), addressing a critical issue of reliability in artificial intelligence. The research team introduced TracVC, a method leveraging information retrieval and influence estimation to trace how LLMs generate confidence expressions back to specific data within their training sets. This breakthrough reveals whether LLMs ground their confidence in relevant content or simply mimic linguistic patterns associated with certainty. Experiments were conducted on both OLMo and Llama models, utilising a question answering framework to assess the basis of their stated confidence levels.
The study unveils a new metric, termed ‘content groundness’, which quantifies the extent to which an LLM bases its confidence on content-related training examples, those directly relevant to the question and answer, versus generic examples of confidence phrasing. Analysis of OLMo2-13B specifically demonstrates a frequent influence from confidence-related data that shares no lexical connection to the query itself, suggesting the model may be imitating superficial expressions of certainty rather than relying on genuine content understanding. These findings point to a fundamental limitation within current LLM training regimes, where models can learn to sound confident without necessarily learning when confidence is justified. Researchers applied TracVC to eleven open-source LLMs, including various OLMo and Llama instruction models, evaluating performance across five question answering benchmarks.
The team adapted the gradient-based attribution method TracIn to estimate the influence of retrieved training examples on confidence generation, comparing the impact of content-related and confidence-related data. Results demonstrate that larger LLMs do not necessarily exhibit higher content groundness than smaller models, leading to the hypothesis that increased model capacity may heighten sensitivity to superficial patterns within the training data. Further investigation revealed that content groundness increases when LLMs correctly answer questions, suggesting a stronger reliance on relevant training examples when the response is accurate. Moreover, post-training techniques were shown to impact content groundness differently across various LLMs, highlighting the complex interplay between training methodology and model behaviour. This work establishes a data-driven perspective on model confidence, offering crucial insights to guide future training approaches and improve the trustworthiness of confidence expressed by large language models.
Tracing LLM Confidence to Training Data
The study pioneers TracVC, a novel method for tracing the origins of verbalized confidence in large language models (LLMs) back to their training data. This approach leverages information retrieval and influence estimation techniques to pinpoint the specific training examples that most strongly influence a model’s expression of confidence. For each test instance, encompassing both a question-answer pair and the LLM’s stated confidence level, TracVC retrieves two distinct sets of ten relevant training examples. One set focuses on lexical similarity to the content, the question and answer itself, while the other prioritizes similarity to the confidence expression.
Scientists then employ an adapted gradient-based attribution method, TracIn, to estimate the influence of each retrieved training example on the model’s confidence generation. By comparing the influence scores between the content-related and confidence-related example sets, the research reveals whether the LLM grounds its confidence in factual content or relies more heavily on superficial cues related to confidence phrasing. Experiments were conducted on eleven open-source LLMs, including OLMo and Llama instruction models, utilising publicly available training corpora and checkpoints refined with techniques such as direct preference optimization. The study evaluated these models across five question answering benchmarks, introducing a new metric called content groundness to quantify the degree to which confidence is rooted in content-related training data.
Content groundness is defined as the proportion of instances where content-related examples exert greater influence than confidence-related examples. Analysis revealed that OLMo2-13B frequently exhibits influence from confidence-related data lexically unrelated to the query, suggesting a tendency to mimic linguistic expressions of certainty rather than genuine content grounding. Furthermore, the work demonstrates that larger LLMs do not necessarily exhibit higher content groundness, proposing that increased model capacity may heighten sensitivity to superficial patterns within the training data. Researchers also found that content groundness increases when evaluating correctly answered questions, hypothesising that LLMs are more likely to ground confidence in relevant content when they have encountered similar examples during training. This innovative methodology provides a foundation for improving the trustworthiness of LLMs by enabling more reliable confidence expression.
👉 More information
🗞 Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs
🧠 ArXiv: https://arxiv.org/abs/2601.10645
