Network pruning is crucial for deploying deep learning models on devices with limited resources, but current methods often fail to reliably control computational cost. Researchers Shahrzad Esmat, Mahdi Banisharif, and Ali Jannesari, all from Iowa State University, present AgenticPruner, a novel framework that tackles this challenge by directly optimising for Multiply-Accumulate (MAC) operation budgets using large language models. Their innovative approach employs a system of interacting ‘agents’ , Profiling, Master, and Analysis , with the Analysis Agent leveraging Claude 3.5 Sonnet to learn effective pruning strategies from past attempts, significantly boosting convergence rates from 48% to 71%. This research is significant because it moves beyond simple parameter reduction to deliver predictable inference latency and accuracy, demonstrated through strong results on ResNet, ConvNeXt, and DeiT models , paving the way for more efficient and reliable deep learning deployments.
Scientists Background
Scientists have developed AgenticPruner, a novel framework that directly optimises for target Multiply-Accumulate (MAC) budgets in neural network pruning, achieving predictable computational costs for deployment on resource-constrained devices. This breakthrough inverts the traditional pruning paradigm by prioritising computational efficiency over simple parameter reduction, a common limitation of existing methods. The research team achieved this by coordinating three specialised agents: a Profiling Agent to analyse model architecture and MAC distributions, a Master Agent to oversee the workflow and monitor divergence, and a crucial Analysis Agent powered by Claude 3.5 Sonnet, which learns optimal pruning strategies from iterative attempts. Through in-context learning, the Analysis Agent significantly improves convergence success rates from 48% to 71% compared to conventional grid search, demonstrating a substantial leap in efficiency.
Building upon the foundations of isomorphic pruning, a technique utilising graph-based structural grouping, AgenticPruner introduces context-aware adaptation through large language model (LLM) reasoning. The Analysis Agent examines historical pruning results, identifying patterns in both successful and failed attempts to predict optimal configurations for subsequent iterations. This learning mechanism enables automatic convergence to target MAC budgets within user-defined tolerance bands, typically achieving +1% to +5% overshoot and -5% to -15% undershoot, eliminating the need for extensive manual hyperparameter tuning0.77G MACs while maintaining 77.04% accuracy, a 0.91% improvement over the baseline. Similarly, ResNet-101 achieves 4.22G MACs with 78.94% accuracy, representing a 1.56% accuracy gain. For ConvNeXt-Small, pruning to 8.17G MACs yields a 1.41x GPU and 1.07x CPU speedup, accompanied by a 45% reduction in parameters. These results highlight the framework’s ability to achieve MAC targeting while simultaneously improving or maintaining model accuracy. This work establishes the feasibility of deploying deep learning models in scenarios demanding strict computational guarantees, such as mobile and embedded platforms. The innovative multi-agent coordination architecture, combined with LLM-guided strategy learning, opens new avenues for automated neural network compression, paving the way for more efficient and accessible artificial intelligence applications. The research contributes a fundamental shift in pruning methodology, moving from parameter-centric approaches to direct MAC-budget optimisation, and promises to unlock significant advancements in the field of deep learning deployment.
Multi-Agent MAC Optimisation via Iterative Pruning offers significant
Scientists developed AgenticPruner, a novel framework leveraging large language models to optimise deep learning models for resource-constrained devices. This work directly addresses the challenge of unpredictable inference latency by focusing on Multiply-Accumulate (MAC) operation budgets, rather than simply reducing parameters. The study pioneered a multi-agent system coordinating three specialised agents to achieve MAC-constrained optimisation through iterative strategy learning. Initially, a Profiling Agent meticulously analysed both model architecture and baseline MAC distributions across different layer types.
Subsequently, the Master Agent orchestrated the entire workflow, continuously monitoring for divergence and maintaining a comprehensive pruning history, crucial for informed decision-making. Central to this innovation was the Analysis Agent, powered by Claude 3.5 Sonnet, which learned optimal pruning strategies from previous attempts via in-context learning. This agent demonstrably improved convergence success rates from 48% to 71% when compared to traditional grid search methods, showcasing a significant methodological advancement. The team engineered this system to build upon the foundations of isomorphic pruning, utilising its graph-based structural grouping approach for efficient pruning.
Furthermore, the research added context-aware adaptation by analysing patterns across multiple pruning iterations, enabling automatic convergence to user-defined MAC budgets within specified tolerance bands. Experiments employed ImageNet-1K across ResNet, ConvNeXt, and DeiT architectures to rigorously validate the framework’s performance. On CNNs, the approach achieved precise MAC targeting while simultaneously maintaining or even improving accuracy; for example, ResNet-50 reached 1.77G MACs with 77.04% accuracy, a 0.91% improvement over the baseline. ResNet-101 achieved 4.22G MACs with 78.94% accuracy, representing a 1.56% accuracy gain.
For ConvNeXt-Small, pruning to 8.17G MACs yielded a 1.41x GPU and 1.07x CPU speedup alongside a 45% reduction in parameters. On Vision Transformers, the study demonstrated consistent MAC-budget compliance, typically within a +1% to +5% overshoot and -5% to -15% undershoot, establishing the feasibility of deployment in scenarios demanding strict computational guarantees, a critical breakthrough for edge computing applications. This method achieves a level of precision previously unattainable in network pruning, offering a robust solution for deploying complex models on limited hardware.
AgenticPruner Optimises MAC Budgets and Accuracy effectively
Scientists achieved direct optimisation of Multiply-Accumulate (MAC) budgets in deep learning models using a novel framework called AgenticPruner. The research team developed a multi-agent system to coordinate pruning strategies, moving beyond traditional parameter-centric approaches to focus on computational cost. Experiments revealed that the Analysis Agent, powered by Claude 3.5 Sonnet, improved convergence success rate from 48% to 71% compared to grid search, demonstrating a significant advancement in pruning efficiency. The team measured performance across ResNet, ConvNeXt, and DeiT architectures on the ImageNet-1K dataset, achieving remarkable results in MAC targeting and accuracy.
ResNet-50 reached 1.77G MACs with 77.04% accuracy, a 0.91% improvement over the baseline. Furthermore, ResNet-101 achieved 4.22G MACs with 78.94% accuracy, representing a substantial 1.56% accuracy gain compared to the original model. These measurements confirm the framework’s ability to maintain or even enhance accuracy while drastically reducing computational demands. For ConvNeXt-Small, pruning to 8.17G MACs yielded a 1.41x speedup on GPUs and a 1.07x speedup on CPUs, alongside a 45% reduction in parameters. Data shows that this combination of speed and efficiency is crucial for deploying complex models on resource-constrained devices.
The Profiling Agent meticulously analysed model and MAC distributions, while the Master Agent monitored divergence and maintained a comprehensive pruning history. Tests prove that AgenticPruner successfully complies with user-defined MAC budget tolerance bands, typically achieving +1% to +5% overshoot and -5% to -15% undershoot, on Vision Transformers. This level of precision is vital for applications requiring strict computational guarantees, such as real-time image processing and embedded systems. The breakthrough delivers a framework capable of automatic convergence to target MAC budgets within 3-5 revisions, eliminating the need for extensive manual hyperparameter tuning. This automated refinement process significantly accelerates the development and deployment of efficient deep learning models.
AgenticPruner delivers MAC-optimised neural network pruning for efficient
Scientists have developed AgenticPruner, a novel framework for pruning neural networks to meet strict computational budgets, specifically targeting Multiply-Accumulate (MAC) operations. This approach moves beyond simply reducing the number of parameters, instead directly optimising for predictable inference latency on resource-constrained devices. The framework employs a multi-agent system coordinated by a large language model, enabling iterative strategy learning and automatic convergence to desired MAC targets. The research makes three key contributions: formulating pruning as a MAC-budget optimisation problem with realistic tolerance bands, designing a specialised multi-agent architecture with profiling, master, and analysis agents, and enabling automatic iterative refinement to achieve target MAC budgets without manual configuration.
Validation on ImageNet-1K with ResNet, ConvNeXt, and DeiT architectures demonstrates successful MAC-budget compliance and, in some cases, improved accuracy compared to unpruned baselines, notably, ResNet-50 and ResNet-101 showed accuracy gains of +0.91% and +1.56% respectively. While accuracy on Vision Transformers lagged behind some specialised methods, the framework consistently met the defined MAC budgets, prioritising predictable computational costs. The authors acknowledge that the architecture’s sensitivity to structured pruning and complex inter-layer dependencies present challenges for accurate MAC prediction. They also note a trade-off between accuracy and strict MAC-budget adherence, deliberately prioritising computational predictability. Future research will focus on incorporating additional hardware-aware constraints like memory bandwidth and cache utilisation, exploring dynamic MAC budgets for adaptive inference, and refining LLM prompting strategies to further improve convergence speed and accuracy, particularly for transformer architectures.
👉 More information
🗞 AgenticPruner: MAC-Constrained Neural Network Compression via LLM-Driven Strategy Search
🧠 ArXiv: https://arxiv.org/abs/2601.12272
