NVIDIA is introducing Warp, a new framework designed to accelerate the creation of simulation data crucial for advancing artificial intelligence, particularly physics-based AI models. Unlike models trained on language, these AI systems require vast amounts of precise, physics-compliant data, and NVIDIA asserts that generating this data is often the primary cost driver in development. Warp bridges CUDA and Python, allowing developers to write high-performance kernels as standard Python functions that are then efficiently compiled for NVIDIA GPUs. This approach differs from existing tensor-based frameworks by enabling flexible, element-by-element control within computational grids; NVIDIA explains that in tensor frameworks, these patterns require composing Boolean masks that quickly become unwieldy and can waste computation on irrelevant elements. Warp’s capabilities extend to automatic differentiation, seamlessly integrating with optimization and training workflows for applications ranging from robotics to geometry processing.

Warp Framework Accelerates GPU-Based Physics Simulation

The demand for increasingly realistic and detailed physics simulations is driving a fundamental shift in computational engineering, and NVIDIA’s Warp framework is positioned to address a critical bottleneck. Unlike traditional approaches reliant on human-driven workflows, the industry is rapidly adopting AI-driven methods, particularly physics foundation models capable of generalizing across diverse geometries and operating conditions. However, these advanced models are heavily dependent on vast quantities of high-fidelity, physics-compliant data, and the generation of that data is often the limiting factor. “Simulation-generated training data is often the limiting cost in practice,” highlighting the need for a simulator that is both GPU-native and seamlessly integrated with machine learning pipelines. Warp distinguishes itself by bridging the gap between the low-level control of CUDA and the high-level expressiveness of Python. This approach contrasts sharply with tensor-based frameworks, where computation is expressed as operations on entire multi-dimensional arrays.

Warp instead allows for the creation of flexible kernels that execute simultaneously across all elements of a computational grid, offering significant advantages when dealing with complex, data-dependent control flow. Warp’s native support for automatic differentiation allows solvers to be easily differentiated, yielding gradients accurate to machine precision without the need for manual derivation or step-size tuning. This interoperability with frameworks like PyTorch, JAX, and NumPy expands the potential applications of Warp beyond simulation, encompassing fields like robotics, perception, and geometry processing. A recent demonstration involved building a two-dimensional Navier, Stokes solver entirely within Warp, showcasing how the programming model maps directly onto a partial differential equation solver.

The example leverages the Fast Fourier Transform algorithm and tile-based primitives for efficient computation, with row-wise FFTs and transposes orchestrated through specialized Warp kernels. “With all the building blocks in place, a single step() call advances the simulation by one timestep,” demonstrating the framework’s ability to streamline complex computational tasks.

2D Navier, Stokes Solved with Vorticity-Streamfunction Formulation

The pursuit of accurate and efficient computational fluid dynamics continues to drive innovation in scientific computing, with a recent demonstration showcasing a 2D Navier-Stokes solver built entirely within the NVIDIA Warp framework. While physics-based machine learning models are gaining traction, their success hinges on access to substantial volumes of high-fidelity data, a need now being addressed by advances in simulation technology. Warp is designed to accelerate simulation and data generation, bridging the gap between low-level CUDA programming and the accessibility of Python. This new solver leverages the vorticity-streamfunction formulation of the incompressible Navier-Stokes equations, a common approach for modeling fluid flow. The method centers on calculating how vorticity, a measure of local rotation, evolves over time, and then recovering the streamfunction, which describes the velocity field.

A key optimization involves utilizing the Fast Fourier Transform algorithm to bypass iterative solving of the Poisson equation, reducing computational demands. The solver discretizes the equations on an N x N grid within an L x L square domain, advancing the solution in time using a third-order strong stability-preserving Runge-Kutta scheme. The architecture of the Warp-based solver is modular, built around two core components: a finite-difference discretization and time-marching kernel, and an FFT-based Poisson solver. The rk3_update() kernel, a central element, computes diffusion and advection terms, performing a single substep of the Runge-Kutta scheme. NVIDIA documentation explains that each thread maps to one grid point on the computational domain, and all N x N points are updated simultaneously with a single wp.launch() call, highlighting the parallel processing capabilities.

The FFT Poisson solver utilizes Warp’s tile-based primitives, specifically wp.tile_fft() and wp.tile_ifft(), to efficiently perform forward and inverse FFTs on rows of the grid, then transposes the data for a full two-dimensional transformation. The framework’s native support for automatic differentiation allows for straightforward integration with optimization and training workflows, opening possibilities for solving optimal perturbation problems and other complex tasks. The team demonstrated this by differentiating through the simulation, a crucial step for applications involving data assimilation and control. The resulting simulation produces a visually compelling representation of decaying turbulence, showcasing the solver’s capabilities.

On an NVIDIA L4 Tensor Core GPU, the Warp GPU backend achieved a speedup of up to 669x over optimized CPU baselines (based on state of the art libraries including FCL plus Embree).

RK3 Time-Stepping with Warp Kernel Discretization

Researchers at NVIDIA are increasingly focused on optimizing simulation workflows, and a recent implementation details how the Warp framework facilitates advanced numerical methods for computational fluid dynamics. A key component is the rk3_update() kernel, a function written in Python but compiled for efficient GPU execution. The approach leverages the single-instruction, multiple-threads paradigm, where each thread updates a single grid point simultaneously with a single wp.launch() call. Beyond the time-stepping scheme, the solver incorporates a Fast Fourier Transform-based Poisson solver to calculate the streamfunction from the vorticity. The two-dimensional FFT is decomposed into row-wise operations, a transpose, and another row-wise operation, all orchestrated by Warp kernels. NVIDIA documentation explains that Warp tile-based primitives enable solving the Poisson equation in Fourier space, highlighting the framework’s utility in handling complex mathematical operations.

The team further optimized the workflow by capturing the simulation step into a CUDA Graph using wp.ScopedCapture and replaying it with wp.capture_launch(), effectively eliminating per-launch overhead. This combination of efficient kernel implementation and workflow optimization is not just about speed; it’s about enabling differentiation through the simulation. The ability to accurately calculate how changes in initial conditions affect the simulation outcome opens up new possibilities for design optimization and control.

The Warp backend reaches up to 252x (locomotion) and 475x (manipulation) speedups over JAX on comparable hardware.

FFT-Based Poisson Solver via Tile Primitives

The demand for increasingly accurate and detailed simulations is driving innovation in computational methods, particularly in fields like fluid dynamics where complex interactions necessitate high-resolution modeling. A critical component of many such simulations is efficiently solving the Poisson equation, a task now being approached with a novel technique leveraging Fast Fourier Transforms and tile-based primitives within the NVIDIA Warp framework. This advancement is not merely about speed; it’s about enabling a new class of AI-driven engineering workflows that rely on vast quantities of physics-compliant data. The team found that traditional tensor-based frameworks can struggle with the data-dependent control flow inherent in many simulation kernels, which is particularly advantageous when dealing with partial differential equations like the Navier-Stokes equations, where calculations often vary significantly per element.

The team’s implementation of a 2D Navier, Stokes solver exemplifies this, utilizing the vorticity-streamfunction formulation and employing a textbook example of 2D decaying turbulence to focus on Warp’s capabilities rather than complex numerical methods. The solver reduces to an algebraic equation in Fourier space, bypassing the need for iterative solvers: “\(\hat{\psi}_{m,n} = \frac{\hat{\omega}_{m,n}}{k_x^2 + k_y^2}\),” where \((k_x, k_y)\) represents the wavenumber pair in Fourier space. This decomposition allows for efficient parallel processing. The team details this process, noting that a 2D FFT also requires a transpose, which can be implemented using either the SIMT or tile paradigm. They explain that composing these three kernels, fft_tiled -> transpose -> fft_tiled, together gives a full 2D forward FFT. The resulting simulation, as demonstrated with decaying turbulence at Re = 1,000, is not only visually compelling but also computationally efficient, with per-launch overhead eliminated through the use of CUDA Graphs.

On a ~134-million-cell lid-driven cavity benchmark, Warp ran about 8x faster than JAX on a single 40 GB NVIDIA A100 Tensor Core GPU , roughly matching the throughput that JAX needed eight A100 Tensor Core GPUs to reach.

Warp Enables Differentiable Simulation and Optimization

The pursuit of increasingly realistic physics-based simulations often clashes with the demands of modern machine learning. While large language models thrive on massive datasets, physics models require high-fidelity data that is computationally expensive to generate; the assumption that more data always improves results does not necessarily hold true when dealing with complex physical systems. NVIDIA’s Warp framework addresses this challenge by fundamentally altering how simulations are constructed and integrated with machine learning workflows. Warp departs from traditional tensor-based frameworks, where computations are expressed as operations on multi-dimensional arrays. This approach is particularly advantageous for simulations reliant on data-dependent control flow, conditionals, early exits, and selective updates, which can be cumbersome and inefficient to implement using tensor operations requiring complex masking. In a Warp kernel, “each thread can branch, skip, or exit independently, expressing this logic naturally without masking workarounds,” enabling a level of flexibility previously unavailable.

The power of Warp extends beyond accelerated simulation; it also natively supports automatic differentiation. This capability is crucial for optimization tasks, allowing researchers to seamlessly integrate simulations into training workflows. The example code and solver are available on the NVIDIA/warp GitHub repository for further exploration.

Source: https://developer.nvidia.com/blog/build-accelerated-differentiable-computational-physics-code-for-ai-with-nvidia-warp/

Tags:

Computational fluid dynamics computer-aided engineering GPU Machine Learning NVIDIA Warp

Quantum News

NVIDIA Builds Framework to Accelerate Simulation Data for AI

Warp Framework Accelerates GPU-Based Physics Simulation

2D Navier, Stokes Solved with Vorticity-Streamfunction Formulation

RK3 Time-Stepping with Warp Kernel Discretization

FFT-Based Poisson Solver via Tile Primitives

Warp Enables Differentiable Simulation and Optimization

Latest Posts by Quantum News:

Anthropic Explores How AI is Accelerating Pace of Scientific Discovery

Anthropic Demonstrates AI’s Capacity for Frontier Theoretical Physics

Quantum Matters: D-Wave Explores Real-World Applications of Quantum Technology In A Podcast