Image Realness Assessment and Localization with Multimodal Features Accurately Identifies Inconsistencies in AI-generated Images Using Textual Descriptions

Determining how realistic AI-generated images appear, and pinpointing any flaws within them, represents a significant challenge in advancing artificial intelligence. Lovish Kaushik, Agnij Biswas, and Somdyuti Paul, all from the Indian Institute of Technology, Kharagpur, address this problem with a new framework that both assesses overall image realism and identifies specific areas where inconsistencies occur. Their approach leverages textual descriptions of visual flaws, generated by an AI trained to recognise these imperfections, effectively mimicking the detailed feedback humans would provide. This multimodal technique achieves improved accuracy in predicting image realism and creates detailed maps highlighting which parts of an image appear most and least convincing, representing a crucial step towards more photorealistic AI generation.

Objective realness assessment and local inconsistency identification of AI-generated images utilise textual descriptions of visual inconsistencies generated by vision-language models trained on large datasets, serving as reliable substitutes for human annotations. The results demonstrate that the proposed multimodal approach improves objective realness prediction performance and produces dense realness maps that effectively distinguish between realistic and unrealistic spatial regions.

Realness and Localization with Multimodal Learning

Scientists are developing new methods to automatically assess the quality and realism of images created by artificial intelligence. This research focuses on building a system that can not only score an image’s believability but also explain why it appears realistic or flawed by identifying unrealistic details. This is increasingly important as AI image generation becomes more prevalent. The team introduces REALM, a framework that combines visual information from the image itself with textual descriptions of its content, leveraging the strengths of both types of data with deep learning and large language models to gain a comprehensive understanding of realistic images. A key feature of REALM is its ability to pinpoint unrealistic regions within an image, creating dense realness maps that visually indicate the level of realism across different areas, providing a detailed, pixel-level assessment of image quality. Experiments demonstrate that combining visual and textual information significantly improves performance compared to relying on either type of data alone, highlighting the importance of multimodal learning for accurate realness assessment.

Dense Realness Mapping of AI Images

Researchers have developed a new framework, REALM, to evaluate and pinpoint inconsistencies in images generated by artificial intelligence, addressing a critical need for assessing generative AI performance and improving photorealistic image creation. The team enhanced existing datasets with textual descriptions of visual inconsistencies, generated using advanced vision-language models, to provide a richer understanding of image realism, serving as a proxy for human assessment. Experiments demonstrate that integrating these textual descriptions with image features significantly improves objective realness estimation, a process termed Cross-modal Objective Realness Estimation (CORE). The team’s approach delivers a dense realness mapping framework, DREAM, capable of pinpointing unrealistic regions at the pixel level, providing interpretable results for understanding image quality. Utilizing state-of-the-art vision-language models, the researchers generated detailed textual descriptions of inconsistencies within images, effectively mimicking human assessment of realism, allowing for a nuanced evaluation of AI-generated images and quantifying the degree of perceived realism. The framework’s performance was validated through detailed analysis, demonstrating the ability to accurately identify and localize areas where AI-generated images deviate from photorealism.

Visual and Textual Realness Evaluation with REALM

Scientists present REALM, a new framework designed to assess and pinpoint inconsistencies in images generated by artificial intelligence. By combining visual features with textual descriptions of image content, the method achieves improved performance in evaluating the perceptual realness of AI-generated images compared to approaches relying on single data types. Importantly, REALM also generates detailed maps highlighting unrealistic regions within an image, offering a degree of explainability regarding its assessment. The research demonstrates that integrating both visual and textual information provides complementary contextual understanding, leading to more accurate realness evaluation. However, the authors acknowledge current limitations stemming from inaccuracies in the text descriptions generated by the language model employed, particularly when depicting specific visual characteristics like facial distortions. Future work will focus on refining the system by fine-tuning open-source vision-language models with human-labeled data and improving the precision of realness maps through advanced image-text matching techniques that consider relational context, leading to a more robust and accurate system for evaluating AI-generated images.

👉 More information
🗞 Image Realness Assessment and Localization with Multimodal Features
🧠 ArXiv: https://arxiv.org/abs/2509.13289

Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

IBM Remembers Lou Gerstner, CEO Who Reshaped Company in the 1990s

December 29, 2025
Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

Optical Tweezers Scale to 6,100 Qubits with 99.99% Imaging Survival

December 28, 2025
Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

Rosatom & Moscow State University Develop 72-Qubit Quantum Computer Prototype

December 27, 2025