The ability of artificial intelligence to understand and reason about diverse cultures remains a significant challenge, and current benchmarks often lack the necessary depth and focus to adequately assess this capability. To address this, Arijit Maji, Raghvendra Kumar, and Akash Ghosh, all from the Indian Institute of Technology Patna, along with Anushka from Banasthali Vidyapeeth University, Nemil Shah from Pandit Deendayal Energy University, and Abhilekh Borah from Manipal University Jaipur, introduce DRISHTIKON, a novel benchmark designed to rigorously test language models’ understanding of Indian culture. This unique resource moves beyond generic global assessments by providing a detailed, multimodal dataset encompassing over 64,000 aligned text-image pairs that represent the diverse regions and rich cultural heritage of India. The creation of DRISHTIKON represents a vital step towards building more inclusive AI systems, offering researchers a robust platform to evaluate and improve the cultural competency of increasingly powerful language models.
Culturally Grounded Reasoning in Language Models
This research details a comprehensive study evaluating the cultural reasoning abilities of large language models (LLMs). Scientists aimed to create a framework for evaluating an LLM’s ability to perform commonsense cultural reasoning, multi-hop reasoning, and analogical reasoning, addressing a significant gap in current AI evaluation benchmarks which often lack cultural sensitivity. The methodology involved using images of cultural artifacts combined with questions, forcing the models to ground their reasoning in visual information, and developing a novel prompting strategy inspired by classical Indian epistemology to encourage nuanced responses. The research team introduced a culturally informed chain-of-thought prompting strategy, guiding the model through a specific reasoning process rooted in Indian philosophical traditions.
Detailed templates were created for generating questions that test cultural commonsense, require connecting multiple facts, and demand drawing parallels between cultural concepts. Importantly, the study extends beyond English-centric evaluation, crucial for building truly inclusive AI systems, and detailed error analysis reveals specific cultural reasoning challenges that LLMs still struggle with, such as relying on surface-level visual patterns rather than deep cultural understanding. This work represents a rigorous attempt to address a critical gap in AI evaluation, providing a valuable framework for building more culturally sensitive and inclusive AI systems. This detailed analysis offers valuable insights into the challenges that LLMs face in understanding and reasoning about culture.
DRISHTIKON Benchmark Assesses Indian Cultural Understanding
Scientists engineered DRISHTIKON, a novel benchmark designed to rigorously evaluate the cultural understanding of generative AI systems, with a specific focus on Indian culture. Unlike existing benchmarks, DRISHTIKON provides deep coverage of India’s diverse regions, incorporating over 64,000 aligned text-image pairs spanning all 15 states and union territories. The study meticulously constructed a dataset capturing rich cultural themes, including festivals, attire, cuisines, art forms, and historical heritage, to provide a comprehensive assessment of AI capabilities. A balanced subset of questions was selected to ensure equitable regional representation, then augmented into three reasoning categories: Common Sense Cultural, Multi-hop Reasoning, and Analogy. Recognizing India’s linguistic diversity, the team extended DRISHTIKON into a multilingual benchmark by translating all questions into 14 Indian languages, harnessing the Gemini Pro language model for translation and implementing a two-stage human verification protocol to ensure accuracy and cultural relevance. This resulted in a comprehensive dataset comprising over 64,000 question-image-language triples, spanning 36 regions and 16 cultural themes.
DRISHTIKON Dataset Assesses Indian Cultural Understanding
Scientists introduced DRISHTIKON, a novel multimodal and multilingual benchmark specifically designed to evaluate cultural understanding in generative AI systems, with a unique focus on Indian culture. This work addresses a critical gap in existing benchmarks, which typically adopt a global scope and lack the fine-grained regional coverage essential for assessing culturally nuanced reasoning. The DRISHTIKON dataset spans 15 languages and encompasses all 28 Indian states and 8 union territories, incorporating a total of over 64,000 aligned text-image pairs. The dataset captures a rich tapestry of cultural themes, including traditional festivals, attire, cuisines, art forms, and historical heritage. Researchers conducted a large-scale evaluation of numerous state-of-the-art vision-language models (VLMs), including both open-source and proprietary systems, ranging in size from 256 million to 27 billion parameters. Experiments were conducted under both zero-shot and chain-of-thought prompting paradigms, revealing critical limitations in current models’ ability to reason over culturally grounded multimodal inputs, particularly for low-resource languages and less-documented regional traditions.
DRISHTIKON, A New Indian Cultural Benchmark
The research team introduced DRISHTIKON, a new benchmark designed to assess the cultural understanding of artificial intelligence systems. Unlike existing datasets with global focus, DRISHTIKON concentrates specifically on Indian culture, encompassing 15 languages and a wide range of cultural themes such as festivals, cuisine, and historical heritage. The dataset consists of over 64,000 aligned text-image pairs, providing a detailed resource for evaluating vision-language models. Evaluations across various models reveal significant limitations in culturally grounded reasoning. While larger, instruction-tuned models demonstrate strong performance, compact models and those originating from India also achieve competitive results, suggesting potential for efficient and culturally aligned AI development.
The findings underscore the need for inclusive AI systems that can effectively process and understand diverse cultural contexts. The authors acknowledge that DRISHTIKON, despite its breadth, cannot fully capture the complete spectrum of India’s regional and linguistic nuances, and that current vision-language models still struggle with abstract reasoning and multi-step problem solving. Future work should focus on developing more inclusive datasets, culturally sensitive training methods, and robust reasoning frameworks to promote equitable AI development for all languages and cultures.
👉 More information
🗞 DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models’ Understanding on Indian Culture
🧠 ArXiv: https://arxiv.org/abs/2509.19274
