NVIDIA Collective Communication Library (NCCL) version 2.23

NVIDIA Collective Communication Library (NCCL) version 2.23 introduces several enhancements to optimize inter-GPU and multi-node communication, crucial for AI and HPC applications. This update brings new algorithms, improved initialization processes, enhanced profiling capabilities, and multiple bug fixes.

New Features and Enhancements

Although specific details are not provided, NCCL 2.23 introduces the PAT Algorithm, designed to optimize inter-GPU communication. It also includes accelerated initialization at scale, improving startup performance for large-scale systems. Intranode user buffer registration now allows user buffers to be registered within a node, reducing memory allocation overhead. Additionally, a Profiler Plugin API has been introduced, enabling the development of custom profiler plugins to analyze and optimize NCCL performance.

Profiler Plugin API and Events

The Profiler Plugin API defines five key function callbacks:

  • init: Initializes the profiler context and sets the event activation mask.
  • startEvent: Starts a new event and returns an opaque handle.
  • stopEvent: Stops an event and marks it as complete.
  • recordEventState: Updates the state of an event.
  • finalize: Releases resources associated with the profiler context.

NCCL supports multiple profiler events categorized in a hierarchical structure, making profiling data more structured and comprehensible. Events include group events, collective events, point-to-point events, and proxy operation events, among others.

Bug Fixes and Minor Improvements

Several bug fixes and minor improvements have been introduced in NCCL 2.23:

  • Asynchronous graph allocation speeds up graph capture.
  • Fatal IB asynchronous events help detect and handle network failures.
  • Improved initialization logs provide better debugging insights.
  • Increased default IB timeout enhances network stability.
  • New NVIDIA peer memory compatibility check improves kernel compatibility.
  • Fixes for performance regressions, NUMA-related crashes, and tree graph search issues ensure better system stability and performance.

Summary

NCCL 2.23 improves inter-GPU and multi-node communication by introducing new algorithms, enhanced profiling tools, and optimizations for large-scale environments. These advancements solidify NCCL’s role in accelerating AI and HPC workloads by improving GPU-based communication efficiency, robustness, and flexibility.

More information
External Link: Click Here For More
Quantum News

Quantum News

There is so much happening right now in the field of technology, whether AI or the march of robots. Adrian is an expert on how technology can be transformative, especially frontier technologies. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that is considered breaking news in the Quantum Computing and Quantum tech space.

Latest Posts by Quantum News:

Lawrence Livermore National Laboratory Partners to Optimize Manufacturing Processes with High-Performance Computing

Lawrence Livermore National Laboratory Partners to Optimize Manufacturing Processes with High-Performance Computing

February 26, 2026
IonQ Reports $130 Million in 2025 Revenue, Tripling Prior Year Results

IonQ Reports $130 Million in 2025 Revenue, Tripling Prior Year Results

February 26, 2026
Xanadu Advances Quantum Software Stack Through PennyLane and MQT Integration

Xanadu Advances Quantum Software Stack Through PennyLane and MQT Integration

February 26, 2026