Flow-matching models represent a leading approach to high-fidelity image and video generation, yet their sequential nature often limits generation speed. Divya Jyoti Bajpai and Aashay Sandansing from the Indian Institute of Technology Bombay, working with Dhruv Bhardwaj, Soumya Roy, Tejas Duseja, Harsh Agarwal, and Manjesh K. Hanawal from Amazon and Manjesh K., present FastFlow, a novel plug-and-play framework designed to accelerate these models without the need for retraining. FastFlow intelligently identifies and approximates denoising steps with minimal impact on quality, leveraging finite-difference velocity estimates and modelling step skipping as a multi-armed bandit problem to optimise the trade-off between speed and performance. This research is significant because it offers a generalisable and computationally efficient solution, achieving over a 2.6x speedup across image generation, video generation, and editing tasks while preserving output quality, and represents a substantial advance in the practical application of flow-matching techniques.
Scientists have developed a new adaptive inference framework, FastFlow, that significantly accelerates image and video generation using flow matching models. These models, known for producing high-quality visuals, are typically limited by a slow, sequential denoising process. FastFlow overcomes this limitation by intelligently identifying and approximating denoising steps that contribute minimally to the overall image or video quality. The core innovation lies in extrapolating future states using prior predictions and finite-difference velocity estimates, effectively skipping computationally expensive steps without sacrificing fidelity. This approach hinges on a multi-armed bandit (MAB), a computational strategy where the system learns to balance speed and accuracy by dynamically deciding how many steps can be safely approximated. The adaptive nature of the MAB allows the system to respond to the complexity of each input, ensuring efficient allocation of computational resources. The research addresses a critical bottleneck in generative artificial intelligence, where increasing model size and resolution demand ever-greater computational resources. Existing acceleration techniques, such as distillation and trajectory truncation, often require retraining or struggle to adapt to different tasks. FastFlow distinguishes itself by being a “plug-and-play” solution, seamlessly integrating with existing pipelines without the need for additional training or complex network architectures. This allows for a more efficient and tailored approach to generative tasks, demonstrating a speedup exceeding 2.6x across various applications, including image generation, video generation, and image editing, while maintaining high-quality outputs. The framework leverages the observation that flow-matching models often exhibit approximately linear denoising trajectories, enabling the use of Taylor series expansions for accurate state approximation. A theoretical bound has also been established, quantifying the deviation between the approximated and full model trajectories. Specifically, the framework aims to maximise cumulative reward, defined as a balance between the number of skipped steps, represented by the scalar μ, and the discrepancy between approximated and true velocities. The cumulative error in the final state after T steps is bounded by O(|S|T3), where |S| represents the number of skipped steps, demonstrating a linear relationship between skipped steps and final error. Algorithm 1 details the implementation, initialising bandits with a full generation to ensure initial exploration of all possible skip lengths. Computational complexity remains low, as the multi-armed bandits employed by FastFlow only maintain a list of rewards, adding negligible overhead to the overall process. The framework seamlessly integrates with existing pipelines and demonstrates generalizability across image generation, video generation, and editing tasks, consistently delivering substantial acceleration without compromising output fidelity. The research team modelled the decision-making process as a multi-armed bandit (MAB) problem, a framework commonly used in reinforcement learning. Each ‘arm’ of the bandit represents a different number of denoising steps to approximate before requiring a full model computation. The bandit algorithm learns, through trial and error, the optimal number of skips to balance inference speed with the preservation of output quality. A reward signal is generated based on the accuracy of the approximation, incentivising the bandit to favour strategies that maximise both speed and fidelity. The relentless demand for more realistic and detailed images and videos is pushing generative models to their computational limits. Flow-matching techniques have emerged as a leading approach, delivering impressive results, but their step-by-step denoising process is inherently slow. FastFlow’s ingenuity lies in its ability to assess the impact of skipping certain computational steps, framing the decision-making process as a multi-armed bandit problem and intelligently balancing speed and fidelity. This isn’t simply about making the existing process faster; it’s about fundamentally rethinking how these models are deployed, moving beyond fixed acceleration strategies towards a dynamic, context-aware approach. The potential for real-world applications is considerable, from real-time content creation and editing to more responsive virtual and augmented reality experiences. However, the theoretical error bounds rely on assumptions about the smoothness of the underlying data, and complex scenes with sharp edges or intricate textures could pose challenges. The next step will likely involve exploring more sophisticated bandit algorithms and investigating how to incorporate perceptual metrics directly into the reward function, ensuring that the accelerated outputs remain visually compelling. Ultimately, the success of FastFlow, and similar adaptive techniques, will depend on bridging the gap between theoretical guarantees and the realities of real-world data. The source code is publicly available, facilitating further research and development in this rapidly evolving field.
👉 More information
🗞 FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
🧠 ArXiv: https://arxiv.org/abs/2602.11105
