Los Alamos National Laboratory researchers have developed a tool achieving high accuracy in detecting instances where artificial intelligence describes objects absent from an input image. The Prelim Attention Score, or PAS, offers a unique solution by monitoring autoregressive vision-language models as they generate output, pinpointing the origin of information and potential errors during the process itself. “The PAS is a real-time, plug-and-play metric that acts as an internal monitor for the AI,” explains Manish Bhattarai, a Los Alamos computer scientist. Unlike many detection methods, PAS can be immediately integrated into existing systems with minimal computational overhead, offering developers a practical path toward more trustworthy multimodal AI, with potential applications ranging from medical imaging to scientific document analysis.

Prelim Attention Score Detects Hallucinations in Vision-Language Models

A new metric developed at Los Alamos National Laboratory offers a significant step toward mitigating the pervasive problem of hallucinations in vision-language artificial intelligence systems. The Prelim Attention Score, or PAS, provides a real-time assessment of whether a model’s output aligns with the provided image or stems from internally generated, potentially inaccurate, text. These autoregressive vision-language models, which generate responses token by token, are increasingly used for tasks combining visual and textual data, but are prone to fabricating details not present in the original input. Researchers tackled this issue by focusing on how these models construct their answers, identifying a critical point for error detection during the generation process itself.

The PAS system functions by monitoring a vision-language model’s prediction of each token, revealing the source of its information and pinpointing where hallucinations are most likely to emerge. It leverages the attention patterns inherent in transformer architectures, a common deep-learning approach, to analyze how the model weighs information from the image, the initial text prompt, and its own developing output. This design choice lowers the barrier to adoption for developers seeking to improve the reliability of their models. The Los Alamos team demonstrated that PAS achieves improved accuracy in detecting these visual inconsistencies, offering a quantifiable improvement over existing methods.

The system generates a score indicating the likelihood of a hallucination, with values closer to zero suggesting a stronger grounding in the input image. “By understanding the way a vision-language model pays attention to preliminary information, PAS can help identify the exact instance where a model begins to over-rely on its own words,” said Xuan Nhat Hoang, a Los Alamos intern, highlighting the tool’s ability to pinpoint the origin of the error. Potential applications span numerous fields, including medical imaging, scientific document analysis, and remote sensing, where accurate visual claims are paramount for informed decision-making. The team will present their findings at the Computer Vision and Pattern Recognition conference this month.

By understanding the way a vision-language model pays attention to preliminary information, PAS can help identify the exact instance where a model begins to over-rely on its own words.
Xuan Nhat Hoang, Los Alamos intern

Transformer Architectures Enable Real-Time Hallucination Monitoring with PAS

Autoregressive vision-language models, increasingly deployed in applications from medical imaging to remote sensing, generate outputs incrementally, a characteristic that presents both opportunity and challenge for detecting inaccuracies. This sequential generation process, built upon transformer architectures, allows for real-time monitoring of information sources; rather than identifying hallucinations after a response is complete, developers can now pinpoint the moment a model begins to stray from verifiable data. The system’s ability to function with existing models is a significant advantage, as it avoids the costly and time-consuming process of complete system redesigns. PAS operates by examining how these transformer-based models attend to different information streams, the initial image, any provided text prompt, and the model’s own developing output, to calculate an attention-based score. This score quantifies the degree to which the model is grounding its responses in the provided image, rather than fabricating details from its internal parameters.

A score approaching zero indicates a strong reliance on the input image, suggesting a lower probability of hallucination, while higher values signal potential inconsistencies. The team reports that PAS achieves improved accuracy in catching hallucinations, positioning it as a leading solution for enhancing the trustworthiness of multimodal AI systems.