A survey of 303 researchers across multiple disciplines revealed consensus on three core criteria defining intelligence: generalisation, adaptability, and reasoning. However, only 29% currently view natural language processing systems as intelligent, with just 16.2% prioritising the development of genuinely intelligent systems in their research.
The pervasive use of the term ‘artificial intelligence’ within natural language processing (NLP) often obscures a fundamental question: what constitutes ‘intelligence’ itself? A recent investigation, conducted by Bertram Højer, Terne Sasha Thorn Jakobsen, Anna Rogers, and Stefan Heinrich from the IT University of Copenhagen and University of Copenhagen, addresses this ambiguity by surveying the perceptions of 303 researchers across multiple disciplines – including machine learning, cognitive science, linguistics, and neuroscience. Their work, entitled “Research Community Perspectives on ‘Intelligence’ and Large Language Models”, reveals a consensus on three core criteria defining intelligence – generalization, adaptability, and reasoning – while simultaneously demonstrating that a majority of researchers do not currently consider existing NLP systems to be genuinely intelligent, and that developing ‘intelligent’ systems is not a primary research objective for most.
Recent research reveals a complex and often critical assessment within the scientific community regarding the application of ‘intelligence’ to artificial systems. A survey of 303 researchers – spanning natural language processing, machine learning, cognitive science, linguistics, and neuroscience – identifies generalisation, adaptability, and reasoning as the primary criteria used to evaluate intelligence. However, the study demonstrates substantial scepticism regarding the ‘intelligence’ of current AI, with approximately 71% of respondents not considering contemporary systems to be genuinely intelligent.
Detailed analysis of specific cognitive attributes reveals particularly strong disagreement concerning attributes such as common sense reasoning (85-90% disagreement) and consciousness (over 90% disagreement). This suggests a significant divergence between public perception and expert opinion.
Qualitative data expands upon these quantitative findings, with researchers frequently questioning the very definition of intelligence. Participants highlight its historical associations with potentially problematic concepts and the difficulty in establishing clear boundaries between intelligent and unintelligent systems. A recurring theme is the concern that the concept of intelligence is laden with historically rooted biases, potentially linked to problematic ideologies and an inherent anthropocentrism – defining intelligence solely in terms of human capabilities.
Researchers consistently emphasise the importance of context and purpose. AI demonstrably excels within narrow, defined domains, but lacks the broad, flexible intelligence exhibited by humans and animals. This suggests a desire for a more inclusive and nuanced understanding of cognitive abilities, and a divergence between the popular framing of ‘artificial intelligence’ and the perceptions of those actively working in the field.
A key distinction emerges between narrow, task-specific AI capabilities and the broader, adaptable intelligence characteristic of biological systems. While AI can outperform humans in specific tasks, it currently lacks the general cognitive flexibility required for navigating complex, real-world scenarios. Participants consistently highlight the importance of embodiment – physical presence and interaction with the environment – suggesting that genuine intelligence necessitates these factors. This raises questions about whether current AI can truly possess intelligence or merely simulate it.
Respondents also critiqued the survey’s binary response options, advocating for more nuanced scales that allow for degrees of agreement or uncertainty. This highlights the need for more sophisticated tools capable of capturing the complexities of cognitive assessment. Furthermore, the validity of benchmarks like the Turing test – a test of a machine’s ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human – as meaningful indicators of genuine understanding warrants further scrutiny. Fewer than 10% of respondents affirm the presence of emotional capacity or consciousness in current AI.
Future research should investigate the role of embodied cognition in developing more robust and adaptable AI and focus on refining assessment methodologies. A more rigorous and nuanced approach to defining and evaluating intelligence in artificial systems is clearly required.
👉 More information
🗞 Research Community Perspectives on “Intelligence” and Large Language Models
🧠 DOI: https://doi.org/10.48550/arXiv.2505.20959
