Research introduces OntoURL, a benchmark evaluating large language models’ (LLMs) handling of ontologies – formal systems representing knowledge through concepts and relationships. Evaluation across 20 open-source LLMs, utilising 58,981 questions from 40 ontologies spanning eight domains, reveals models excel at understanding ontological knowledge but struggle with reasoning and learning. This demonstrates current LLMs possess limitations in manipulating symbolic knowledge, and establishes OntoURL as a tool for assessing progress in integrating LLMs with formal knowledge systems.

The capacity of large language models (LLMs) to process and apply structured knowledge remains a key area of investigation as these systems become increasingly integrated into complex applications. While adept at pattern recognition and natural language processing, their ability to manipulate formal, symbolic representations of knowledge – known as ontologies – has received less scrutiny. Researchers at the University of Groningen and Leiden University have addressed this gap with the development of a new benchmark, OntoURL, designed to rigorously evaluate LLMs’ ontological capabilities across understanding, reasoning, and learning. Xiao Zhang, Huiyuan Lai, and Johan Bos, from the CLCG at the University of Groningen, collaborated with Qianru Meng from LIACS at Leiden University to present their findings in a paper titled ‘OntoURL: A Benchmark for Evaluating Large Language Models on Symbolic Ontological Understanding, Reasoning and Learning’.

Evaluating Symbolic Reasoning in Large Language Models

Recent work introduces OntoURL, a benchmark designed to systematically assess the capacity of large language models (LLMs) to process and apply formal, symbolic knowledge encoded within ontologies. Ontologies are formal representations of knowledge as a set of concepts within a domain and the relationships between those concepts. This structured approach contrasts with the predominantly statistical methods used in LLM training.

The OntoURL benchmark consists of almost 60,000 questions generated from 40 distinct ontologies, offering a standardised method for evaluating LLM performance across three key dimensions: understanding, reasoning, and learning.

Results indicate that current open-source LLMs demonstrate a degree of competence in understanding ontological knowledge – successfully identifying concepts and relationships as defined within the ontologies. However, substantial limitations become apparent when evaluating reasoning and learning abilities. Models consistently underperform on tasks demanding inferential steps – drawing conclusions from stated facts – or the application of ontological knowledge to previously unseen scenarios. This discrepancy suggests a fundamental difference between an LLM’s capacity for memorisation and its ability to perform genuine symbolic manipulation.

Performance also varies significantly both between different LLMs and across different ontologies. This variability indicates a need for further investigation into the factors influencing performance, such as the complexity of the ontology or the specific training data used for the LLM. The study underscores the necessity for novel approaches to improve LLM reasoning capabilities and facilitate the effective utilisation of structured knowledge.

👉 More information
🗞 OntoURL: A Benchmark for Evaluating Large Language Models on Symbolic Ontological Understanding, Reasoning and Learning
🧠 DOI: https://doi.org/10.48550/arXiv.2505.11031

Tags:

benchmark Evaluation Metrics Knowledge Representation Large Language Models Learning natural language processing Ontologies OntoURL. Reasoning Symbolic Knowledge

Quantum News

Large Language Models’ Ontological Capabilities Assessed by New Benchmark Dataset.

Evaluating Symbolic Reasoning in Large Language Models

Latest Posts by Quantum News:

Google to Integrate Intrinsic’s Robotics Platform with Gemini and Cloud

Nokia Validates Quantum-Safe Network Blueprint for Canadian Infrastructure

UCSB Researchers Identify Robust CN Center Qubit in Silicon. Practical For Telecom Industry