On April 21, 2025, researchers Andy Wanna, Hanqiu Chen, and Cong Hao introduced ForgeBench: A Machine Learning Benchmark Suite and Auto-Generation Framework for Next-Generation HLS Tools, addressing the limitations of High-Level Synthesis (HLS) in modern hardware design.
High-Level Synthesis (HLS) has not become mainstream due to outdated benchmarks for modern machine learning applications and tools focused on individual accelerators rather than architecture-oriented design. To address these limitations, researchers introduced ForgeBench, an ML-focused benchmark suite with a hardware design auto-generation framework. It includes over 6,000 representative ML HLS designs and a second suite highlighting resource sharing to emphasize the need for architecture-oriented design in future HLS tools.
Machine learning (ML) has become integral to modern technology, driving advancements in artificial intelligence, data analysis, and automation. As ML models grow more complex, the demand for efficient computation has intensified. While graphics processing units (GPUs) have traditionally been used to accelerate ML workloads, they present limitations in power consumption and flexibility. Field-Programmable Gate Arrays (FPGAs), however, are emerging as a promising alternative. These devices offer customizable hardware architectures that can be tailored to specific tasks, addressing the need for energy-efficient and high-performance computing solutions in machine learning.
This research introduces an innovative approach to FPGA-based ML acceleration, focusing on optimizing both hardware design and software tools. By leveraging domain-specific architectures (DSAs) and advanced high-level synthesis (HLS) frameworks, researchers have developed methods that significantly enhance FPGA performance for executing ML workloads. Central to this innovation is DSAgen, a tool designed to synthesize programmable spatial accelerators. DSAgen automates the process of mapping ML algorithms onto FPGA hardware, ensuring efficient and scalable designs. This approach reduces development time while enabling FPGAs to handle complex models with ease.
The methodology involves a multi-step optimization process. Researchers first analyze target ML models to identify computational requirements and bottlenecks. Using tools like Vitis HLS, they translate these insights into hardware designs that maximize parallelism and minimize resource usage. Domain-specific architectures allow FPGAs to be configured for specific computations, such as image recognition tasks with EfficientNet or MobileNets, or deep learning applications with ResNet architectures. This specialization ensures peak efficiency, delivering faster inference times and lower power consumption compared to general-purpose GPUs.
The research demonstrates the potential of FPGA-based accelerators in transforming ML workloads. Optimized designs have achieved significant performance improvements across benchmarks. FPGAs configured with DSAgen show faster inference times and reduced latency compared to GPUs. Additionally, these devices consume less power, making them a sustainable choice for ML applications.
FPGA-based acceleration represents a promising direction for advancing machine learning. By offering customizable architectures and efficient computation, FPGAs address the limitations of traditional GPU-based approaches. As research continues, FPGAs are expected to play an increasingly important role in shaping the future of ML, enabling faster, more energy-efficient solutions across various applications.
👉 More information
đź—ž ForgeBench: A Machine Learning Benchmark Suite and Auto-Generation Framework for Next-Generation HLS Tools
đź§ DOI: https://doi.org/10.48550/arXiv.2504.15185
