The quest to model complex causal relationships drives progress across numerous scientific disciplines, and researchers are now leveraging the power of large language models to build significantly more comprehensive systems. Sridhar Mahadevan from Adobe Research and the University of Massachusetts, Amherst, along with colleagues, introduces a new approach to constructing large causal models, exploiting the latent knowledge within these advanced artificial intelligence systems. Their work centres on DEMOCRITUS, a system designed to build, organise and visualise causal models spanning diverse fields, moving beyond traditional, narrow-domain causal inference. This innovative method generates causal questions and extracts statements from text, then weaves these fragmented claims into a coherent, relational structure, representing a substantial step towards modelling complex systems and understanding causality at scale.

Categorical Causal Inference for Artificial Intelligence

This research explores a new approach to building artificial intelligence systems that can understand cause-and-effect relationships, moving beyond simply identifying correlations. Scientists are investigating how to represent causality using category theory, a branch of mathematics that provides a powerful framework for modeling complex systems. This approach aims to create AI that can reason about the world in a more robust and flexible way, bridging the gap between traditional causal inference and modern AI techniques like large language models. The work focuses on representing causal relationships using diagrams and mathematical structures, allowing for a more abstract and compositional understanding of causality. Researchers connect these categorical structures to the implementation of neural networks, specifically using backpropagation, and explore advanced concepts like topos theory to model complex systems.

Large Causal Models from Language and Learning

Scientists have developed DEMOCRITUS, a new system for constructing large causal models by leveraging the capabilities of large language models. This work introduces a paradigm shift in causal discovery, moving beyond traditional methods reliant on numerical data from narrow studies. DEMOCRITUS builds models spanning hundreds of domains and encompassing millions of specific causal claims, by combining language model knowledge with advanced categorical and deep learning techniques. The system utilizes a highly optimized version of Qwen3-Next-80B-A3B-Instruct, demonstrating its capacity to process complex information and generate detailed causal relationships.

The core of DEMOCRITUS is a six-module pipeline that begins with the language model acting as a discovery engine for domain topics, causal questions, and statements. A Geometric Transformer layer then processes the resulting relational graph, producing a manifold of node embeddings. These manifolds are organized as slices of a larger topos, enabling querying, visualization, and selective refinement of the causal model. Researchers applied DEMOCRITUS to a study examining the collapse of the Indus Valley Civilization, utilizing topics including Indus Valley Civilization, Holocene monsoon, and wheat/barley agriculture, demonstrating the system’s potential and illustrating its limitations.

Causal Models Built From Language Data

The research team has developed DEMOCRITUS, a novel system which constructs explicit causal models by leveraging the capabilities of large language models. Unlike traditional causal inference methods that rely on numerical data from experiments, DEMOCRITUS extracts causal statements from text and weaves them into structured, navigable graphs. The system operates by first proposing relevant topics, then querying a language model for causal relationships within those topics, and finally converting these relationships into a network of interconnected variables. A key innovation is the use of geometric transformers to embed these local causal models within a broader framework, creating a reusable and inspectable representation of causal knowledge.

DEMOCRITUS has been successfully applied across diverse domains including archaeology, biology, climate change, economics, and medicine, demonstrating its versatility and potential for broad application. The resulting models emerge from a multi-stage pipeline that builds a structured representation of causal relationships, allowing for the creation of local causal models that can be explored and deepened. Researchers acknowledge that the current system has limitations regarding potential inaccuracies or biases in the extracted causal statements, and future work will focus on addressing these limitations and expanding the system’s capabilities.

👉 More information
🗞 Large Causal Models from Large Language Models
🧠 ArXiv: https://arxiv.org/abs/2512.07796

Tags:

categorical machine learning Causal Inference causal LCMs DEMOCRITUS LLMs manifold ontologies relational causal triples topological universal slices

Large Causal Models from Large Language Models Leverage LLMs to Build and Visualize Causal Relationships across Disparate Domains

Categorical Causal Inference for Artificial Intelligence

Large Causal Models from Language and Learning

Causal Models Built From Language Data

Rohail T.

Latest Posts by Rohail T.:

Accurate Quantum Sensing Now Accounts for Real-World Limitations

Quantum Error Correction Gains a Clearer Building Mechanism for Robust Codes

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently