Researchers are investigating the extent to which large language models truly ‘learn’ during training or simply retrieve pre-existing knowledge, a debate encapsulated by the superficial alignment hypothesis. Tomás Vergara-Browne and Marius Mosbach, both from Mila Quebec AI Institute and McGill University, alongside Darshan Patil of Universit e de Montr eal, Ivan Titov working with colleagues at the University of Edinburgh and University of Amsterdam, Siva Reddy from McGill University, and Tiago Pimentel from ETH Z urich, present a novel framework for operationalising this hypothesis through a metric termed ‘task complexity’. This work is significant because it provides a unified definition, measuring task complexity as the length of the shortest program needed to achieve a target performance, and demonstrates that pre-training dramatically reduces this complexity for tasks including mathematical reasoning, translation and instruction following. Their findings suggest that while pre-trained models contain substantial knowledge, post-training adaptation can collapse the information needed to access strong performance from gigabytes to just a few kilobytes, highlighting the efficiency of this process.

Scientists are beginning to understand how much learning really happens within artificial intelligence systems. The ability to adapt to new challenges may not require extensive retraining, but instead relies on accessing existing knowledge. This work offers a way to measure that hidden potential and quantify how efficiently models can solve problems. Recent work proposes a way to measure ‘task complexity’, not in terms of difficulty for humans, but as the length of the shortest possible computer program required to achieve a certain performance level.

This research suggests that pre-trained models already contain a vast amount of knowledge, and that subsequent training primarily serves to unlock and refine this existing capability. Rather than learning entirely new skills, these models are adapted by finding surprisingly short ‘programs’ within their existing parameters. The study introduces a novel framework grounded in algorithmic information theory, defining the superficial alignment hypothesis, the idea that most learning happens during initial pre-training, in terms of program length.

This means that if a task is truly ‘easy’ for a pre-trained model, only a small amount of additional information is needed to achieve high performance. Researchers tested this concept across mathematical reasoning, machine translation, and instruction following, using models like SmolLM3 and Olmo3. Findings indicate that programs as small as 151 kilobytes can be sufficient to adapt these models to perform well on certain tasks.

Yet accessing strong performance initially demands extensive programs, often measured in megabytes or even gigabytes of data. However, post-training significantly reduces this complexity, allowing the same level of performance with programs orders of magnitude smaller. This highlights a key insight: adapting a pre-trained model often requires only a few kilobytes of new information.

Prior approaches to adaptation, such as finetuning on limited data, updating only a few parameters, or crafting effective prompts, are now viewed as different strategies for discovering these short, efficient programs. At its core, this work provides a quantitative lens through which to examine the superficiality of adaptation. By framing the problem in terms of algorithmic complexity, the research clarifies ongoing debates about how much genuine learning occurs during post-training.

The ability to perform new tasks is often already present within the model, waiting to be ‘unlocked’ with minimal additional information. This perspective has implications for designing more efficient adaptation methods and understanding the true extent of knowledge encoded within these powerful language models.

Measuring task difficulty via minimal program length for large language models

A central element of this work involves estimating task complexity, defined as the length of the shortest program capable of achieving a specified performance level on a given task. This metric offers a novel approach to evaluating the superficial alignment hypothesis, which suggests large language models primarily store knowledge during pre-training and access it later.

This research frames the hypothesis as a claim that pre-training substantially reduces the complexity of performing well on various tasks. Consequently, differing arguments supporting the hypothesis are interpreted as distinct strategies for identifying these short programs. To quantify task complexity, researchers focused on upper-bounding conditional complexity by constructing programs that adapt a pre-trained model to perform a task.

Performance was measured after executing these programs, establishing a natural upper limit on the task’s complexity based on program length. Three distinct strategies were employed to generate these programs, leveraging training data to create programs adapting the pre-trained model for improved task performance. Data methods involved training the model on a compressed subset of the available data, utilising arithmetic coding to minimise program size.

The resulting program encompassed the compressed data alongside the code necessary for decompression and training. Parametric methods focused on training adapter weights, small, trainable modules added to the pre-trained model, and encoding these weights directly into the program. Inference-control methods compressed prompts and appended them to the evaluation input, allowing the model to process the combined input. Program length was primarily determined by the size of the compressed data or parameters, with a constant overhead for code and other necessary components.

Pre-training substantially reduces program-length requirements for natural language processing

Task complexity measurements reveal that pre-trained models dramatically reduce the information needed to achieve strong performance on several natural language processing tasks. Initial analysis established that programs of just 151 kilobytes can effectively adapt large language models to attain notable results on tasks including mathematical reasoning, machine translation, and instruction following.

This figure demonstrates that substantial capability already resides within these models prior to task-specific adaptation. Further investigation into the evolution of task complexity during training highlighted a clear transition in program length requirements. Randomly initialised models necessitated programs measured in gigabytes to reach acceptable performance levels, indicating a lack of pre-existing knowledge.

Yet, pre-training alone unlocked access to strong performance, though still demanding programs on the order of megabytes to gigabytes for optimal results. Post-training collapsed this complexity by several orders of magnitude, allowing the same level of performance to be achieved with programs measured in kilobytes. At the outset, the work quantified task complexity across three distinct NLP tasks, revealing a significant disparity between pre-trained and randomly initialised models.

For instance, achieving a target performance on mathematical reasoning with a randomly initialised model demanded programs exceeding 10^9 bits, while leveraging a pre-trained model reduced this requirement to under 10^7 bits. Still, accessing strong performance with pre-trained models initially required programs of substantial length, typically ranging from 10^6 to 10^7 bits.

By contrast, post-training adaptation consistently yielded programs below 10^5 bits, representing a substantial reduction in complexity. This work unifies data efficiency, parameter efficiency, and inference-time control as different strategies for finding short programs to solve a task, allowing researchers to directly compare program lengths.

Quantifying adaptability reveals how pre-training shapes large language model performance

Scientists have long sought to understand precisely what large language models actually learn during their initial, intensive pre-training phase. For years, the debate has centred on the ‘superficial alignment hypothesis’, a compelling idea suggesting these models already contain most of the knowledge they will ever possess, and subsequent training simply unlocks it.

Defining “superficial alignment” proved surprisingly elusive, leading to varied interpretations and hindering concrete progress. This research offers a fresh perspective, framing the problem not as one of knowledge acquisition, but of complexity, specifically, how much information is needed to adapt a pre-trained model to a new task. A key finding is that post-training dramatically reduces the ‘program length’, a measure of informational need, required to achieve strong performance, sometimes by several orders of magnitude.

This suggests adaptation isn’t about adding substantial new knowledge, but about efficiently accessing and organising what’s already there. Still, the initial access to this pre-existing knowledge can demand surprisingly large programs, measured in gigabytes, highlighting a current limitation in efficiently surfacing this potential. Beyond the technical details, this work shifts the conversation.

It moves beyond simply demonstrating that models can perform tasks, and towards understanding the fundamental limits of adaptation. While the current metrics focus on program length, future work could explore the energy cost of adaptation, offering a more holistic view of efficiency. Further investigation is needed to determine if this compression principle applies equally across all task types, or if some domains remain stubbornly complex. Ultimately, this research doesn’t solve the alignment problem, but it provides a powerful new lens through which to view it, potentially guiding the development of more efficient and adaptable artificial intelligence systems.

👉 More information
🗞 Operationalising the Superficial Alignment Hypothesis via Task Complexity
🧠 ArXiv: https://arxiv.org/abs/2602.15829

Tags:

Instruction Following Large Language Models mathematical reasoning Post-training pre-training program length. Superficial alignment hypothesis Task Complexity

AI Learns Tasks with Surprisingly Short Programs

Measuring task difficulty via minimal program length for large language models

Pre-training substantially reduces program-length requirements for natural language processing

Quantifying adaptability reveals how pre-training shapes large language model performance

Rohail T.

Latest Posts by Rohail T.:

Accurate Quantum Sensing Now Accounts for Real-World Limitations

Quantum Error Correction Gains a Clearer Building Mechanism for Robust Codes

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently