On April 18, 2025, a team led by Vicki Carrica and Maxwell Onyango published Toward Portable GPU Performance: Julia Recursive Implementation of TRMM and TRSM, detailing an efficient Julia-based approach to triangular matrix operations on NVIDIA, AMD, and Apple Silicon GPUs.

This paper presents a recursive implementation in Julia for GPUs of triangular matrix-matrix multiplication (TRMM) and triangular solve (TRSM), restructured to leverage general matrix-matrix multiplication (GEMM) for improved GPU memory hierarchy utilization.

Using Julia’s multiple dispatch, metaprogramming, and frameworks like GPUArrays and KernelAbstractions, the authors developed a hardware-agnostic API supporting NVIDIA, AMD, and Apple Silicon GPUs. For large matrices, the implementation achieves throughput comparable to vendor libraries like cuBLAS and rocBLAS while providing TRMM/TRSM routines for Apple Silicon for the first time. The concise codebase demonstrates Julia’s ability to deliver near-vendor performance across heterogeneous architectures.

NVIDIA is at the forefront of advancing GPU technology, significantly impacting fields such as artificial intelligence, scientific research, and high-performance computing (HPC). Their innovations are strategically aimed at enhancing efficiency, scalability, and adaptability across diverse applications.

👉 More information
🗞 Toward Portable GPU Performance: Julia Recursive Implementation of TRMM and TRSM
🧠 DOI: https://doi.org/10.48550/arXiv.2504.13821

Tags:

Cublas General Matrix-Matrix Multiplication (GEMM) GPUArrays GPUs Julia KernelAbstractions Metaprogramming Multiple Dispatch rocBLAS Triangular Matrix-Matrix Multiplication (TRMM) Triangular Solve (TRSM)

Quantum News

High-Performance Recursive TRMM/TRSM Implementation in Julia for GPUs Across Architectures

Latest Posts by Quantum News:

MIT Technique Identifies Critical Variables to Improve Design Optimization

Xanadu Highlights Path to Public Listing, Scalable Quantum Computing

MicroCloud Hologram Advances Deployable Quantum Recurrent Neural Network Technology