Optimising the architecture of deep neural networks, specifically the configuration of channels within layers, remains a significant challenge due to the vast number of possibilities and computational limitations. Tolgay Atinc Uzun, Dmitry Ignatov, and Radu Timofte from the University of Würzburg present a novel approach utilising Large Language Models (LLMs) to navigate this complex search space. Their research explores an LLM-driven Neural Architecture Search (NAS) framework, framing the process as a series of code generation tasks informed by model performance. By generating a substantial dataset of valid network configurations using Abstract Syntax Tree (AST) mutations, the team overcomes the issue of limited training data and enables the LLM to learn the subtle relationships between channel arrangements and resulting accuracy. This work demonstrates statistically significant improvements on the CIFAR-100 dataset, suggesting LLMs can effectively internalise and apply domain-specific knowledge to optimise deep learning models.

This research investigates an LLM-driven NAS framework for channel configuration, framing the search process as a sequence of conditional code generation tasks. These mutated networks, while not necessarily high-performing themselves, provide the LLM with sufficient data to learn the rules of valid architectural construction.

The approach centres on iteratively generating candidate architectures, evaluating their performance, and using the results to guide the LLM’s subsequent code generation. This iterative process allows the LLM to learn a policy for generating architectures that are both valid and likely to perform well, represented as code for direct implementation and evaluation. By leveraging the LLM’s ability to understand and generate code, the framework bypasses the limitations of traditional NAS methods. The use of AST mutations to generate a large corpus of valid architectures allows the LLM to learn a robust policy for architectural refinement, simplifying implementation and facilitating rapid experimentation and optimisation. These methods generally treat a model’s definition as a fixed graph, failing to utilise the underlying semantic structure of the code itself. The study specifically concentrates on optimising channel configurations within neural networks, modifying layer widths while ensuring consistency across interconnected components. A large dataset was programmatically generated to train the LLM, enabling it to learn both the correct syntax for execution and the relationship between channel dimensions and resulting performance. This allows the LLM to function as an optimiser, iteratively refining architectures with the goal of improving accuracy, validated using the CIFAR-100 dataset with statistically significant improvements. The research team formulated the process as a sequence of conditional code generation tasks, utilising an LLM to refine network specifications based on performance data. Experiments conducted on the CIFAR-100 dataset demonstrate statistically significant improvements in accuracy through this LLM-driven NAS framework.

The team successfully trained the LLM to acquire domain-specific architectural priors, enabling it to distinguish optimised designs from random configurations. This programmatic bootstrapping mechanism, utilising AST manipulation, generated a large dataset of syntactically valid and tensor-consistent network variants, resolving the initialisation problem common in learning-based search methods. The study focused specifically on optimising channel configurations, requiring the maintenance of consistency across network components like residual connections. Training on the programmatically generated dataset allowed the LLM to learn both the syntax required for executable code and the correlation between channel dimensions and overall performance.

Results demonstrate that the LLM functions effectively as an optimiser, iteratively refining architectures to enhance accuracy, differing from methods that treat channel counts as continuous numerical actions. Furthermore, the research distinguishes itself from techniques like network slimming by optimising the architecture definition itself before training commences. By framing the search as a conditional code generation task, researchers moved beyond traditional graph or numerical methods, enabling optimisation while adhering to strict structural constraints. Experiments conducted on the CIFAR-100 dataset demonstrate statistically significant gains in accuracy, with the best performing model achieving a 24.1% relative improvement over the initial distribution.

Analysis of the population-level results indicates a systematic shift in the performance frontier, suggesting the language model successfully internalised architectural constraints and learned domain-specific knowledge. Furthermore, the framework identified unconventional channel priors, favouring irregular widths and late-stage expansion, resulting in parameter-efficient models. The authors acknowledge limitations related to the scope of the experiments, focusing specifically on channel configuration for vision networks. Future research directions include extending the framework to encompass broader NAS settings, such as topology search and hardware-aware optimisation. While the current work demonstrates the potential of language models in architectural design, further investigation is needed to fully explore its capabilities across diverse network architectures and datasets.

👉 More information
🗞 Closed-Loop LLM Discovery of Non-Standard Channel Priors in Vision Models
🧠 ArXiv: https://arxiv.org/abs/2601.08517

Tags:

Abstract Syntax Tree channel configuration CIFAR-100 conditional code generation Large Language Models Neural Architecture Search performance telemetry. tensor shape compatibility

Llm Discovery Advances Vision Models with Non-Standard Channel Priors and Vast Data Generation

Rohail T.

Latest Posts by Rohail T.:

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently

Protected: Quantum Computing Tackles Fluid Dynamics with a New, Flexible Algorithm

Protected: Silicon Unlocks Potential for Long-Distance Quantum Communication Networks