Automated design of neural networks represents a persistent challenge in computer vision, demanding methods that are both effective and computationally feasible, given the diversity of tasks and resource limitations. Chandini Vysyaraju, Raghuvir Duvvuri, Avi Goyal, and colleagues at the University of Würzburg address this need by exploring the potential of large language models, an emerging alternative to traditional, intensive neural architecture search techniques. Their work systematically investigates how to best utilise these models for computer vision tasks, focusing on the crucial aspects of prompt engineering and validation. The team demonstrates that providing just three supporting examples significantly improves performance, and introduces a remarkably fast validation method, reducing processing time by a factor of one hundred while preventing the redundant training of identical network designs, ultimately making automated network design more accessible to a wider range of researchers.

The work introduces Few-Shot Architecture Prompting (FSAP), a systematic study of how the number of supporting examples influences LLM-based generation, and Whitespace-Normalized Hash Validation, a rapid deduplication method. Experiments across seven computer vision benchmarks, MNIST, CIFAR-10, CIFAR-100, CelebA, ImageNette, SVHN, and Places365, resulted in the generation of 1,900 unique network designs, providing a rich dataset for exploring automated design possibilities. The team measured the impact of varying the number of supporting examples (n) provided to the LLM, finding that n=3 best balances diversity and contextual focus for vision tasks.

Results demonstrate that balanced mean accuracy peaks at 53.1% with three supporting examples, a 1.6% improvement over the baseline, while performance degrades with n>3, with catastrophic failure observed at n=6. The innovative Whitespace-Normalized Hash Validation method achieved a 100x speedup over traditional methods, completing deduplication in less than 1ms and saving an estimated 200 to 300 GPU hours. Further analysis revealed successful architectural synthesis, such as combining ResNet-style residual units with AlexNet’s classifier design, and merging Dual Path Network (DPN) blocks with a convolutional backbone. Generated models exhibited creative features like unusual channel configurations and hierarchical residual units with multi-scale features, demonstrating the LLM’s capacity for innovation. Comparisons with models generated using only one supporting example (n=1) showed a clear difference, with those models exhibiting shallow variation and lacking modern architectural patterns.

Prompting Optimises Neural Network Design Performance

This research presents significant advances in automating the design of neural networks for computer vision, leveraging the capabilities of large language models. Scientists successfully demonstrate that carefully crafted prompts, utilising just three supporting examples, optimise performance across a range of vision tasks, from digit recognition to image classification and scene understanding. This approach achieves substantial improvements, notably an 11.6% increase in performance on the challenging CIFAR-100 dataset, and establishes a clear upper limit on the number of examples used in prompting, beyond which performance rapidly declines. Furthermore, the team developed a highly efficient validation method, utilising whitespace-normalised hash validation, which dramatically speeds up the process of identifying and eliminating duplicate network designs, saving substantial computational resources. Crucially, they also introduced a dataset-balanced evaluation methodology, addressing a fundamental flaw in existing comparative analyses by mitigating statistical biases arising from the varying difficulty and characteristics of different vision datasets.

LLMs Design Novel Computer Vision Networks

Large language models (LLMs) present a promising alternative to computationally intensive Neural Architecture Search (NAS), and this work validates their application to computer vision architecture generation. Building on the NNGPT/LEMUR framework, scientists demonstrate that LLMs can effectively generate novel neural network architectures. This establishes rigorous evaluation practices and provides actionable guidelines for LLM-based automated design, making this technology more accessible to researchers. The team found that using three supporting examples best balances architectural diversity and contextual focus for vision tasks. The research also highlights the potential of LLMs to create innovative features, such as unusual channel configurations and hierarchical residual units with multi-scale features, demonstrating the LLM’s capacity for innovation and opening new avenues for neural network design.

👉 More information
🗞 Enhancing LLM-Based Neural Network Generation: Few-Shot Prompting and Efficient Validation for Automated Architecture Design
🧠 ArXiv: https://arxiv.org/abs/2512.24120

Tags:

automated design benchmark datasets computational constraints Computer Vision Few-shot Prompting Image Generation Large Language Models network design task diversity Whitespace-Normalized Hash Validation

Automated Architecture Design Enables 100x Faster Computer Vision with LLMs

Prompting Optimises Neural Network Design Performance

LLMs Design Novel Computer Vision Networks

Rohail T.

Latest Posts by Rohail T.:

Steel Strength Boosted by Interface Atom Arrangement

Fibre Optic Calculations Now Avoid Critical Errors

Acceleration Explains Galactic Curves and Cosmic Expansion