NVIDIA Collective Communication Library (NCCL) version 2.23

NVIDIA Collective Communication Library (NCCL) version 2.23 introduces several enhancements to optimize inter-GPU and multi-node communication, crucial for AI and HPC applications. This update brings new algorithms, improved initialization processes, enhanced profiling capabilities, and multiple bug fixes.

New Features and Enhancements

Although specific details are not provided, NCCL 2.23 introduces the PAT Algorithm, designed to optimize inter-GPU communication. It also includes accelerated initialization at scale, improving startup performance for large-scale systems. Intranode user buffer registration now allows user buffers to be registered within a node, reducing memory allocation overhead. Additionally, a Profiler Plugin API has been introduced, enabling the development of custom profiler plugins to analyze and optimize NCCL performance.

Profiler Plugin API and Events

The Profiler Plugin API defines five key function callbacks:

  • init: Initializes the profiler context and sets the event activation mask.
  • startEvent: Starts a new event and returns an opaque handle.
  • stopEvent: Stops an event and marks it as complete.
  • recordEventState: Updates the state of an event.
  • finalize: Releases resources associated with the profiler context.

NCCL supports multiple profiler events categorized in a hierarchical structure, making profiling data more structured and comprehensible. Events include group events, collective events, point-to-point events, and proxy operation events, among others.

Bug Fixes and Minor Improvements

Several bug fixes and minor improvements have been introduced in NCCL 2.23:

  • Asynchronous graph allocation speeds up graph capture.
  • Fatal IB asynchronous events help detect and handle network failures.
  • Improved initialization logs provide better debugging insights.
  • Increased default IB timeout enhances network stability.
  • New NVIDIA peer memory compatibility check improves kernel compatibility.
  • Fixes for performance regressions, NUMA-related crashes, and tree graph search issues ensure better system stability and performance.

Summary

NCCL 2.23 improves inter-GPU and multi-node communication by introducing new algorithms, enhanced profiling tools, and optimizations for large-scale environments. These advancements solidify NCCL’s role in accelerating AI and HPC workloads by improving GPU-based communication efficiency, robustness, and flexibility.

More information
External Link: Click Here For More
Quantum News

Quantum News

As the Official Quantum Dog (or hound) by role is to dig out the latest nuggets of quantum goodness. There is so much happening right now in the field of technology, whether AI or the march of robots. But Quantum occupies a special space. Quite literally a special space. A Hilbert space infact, haha! Here I try to provide some of the news that might be considered breaking news in the Quantum Computing space.

Latest Posts by Quantum News:

PI Enables 21% More Throughput in High-Tech Manufacturing Automation

PI Enables 21% More Throughput in High-Tech Manufacturing Automation

January 21, 2026
Alice & Bob Achieves 10,000x Lower Error Rate with New Elevator Codes

Alice & Bob Achieves 10,000x Lower Error Rate with New Elevator Codes

January 21, 2026
WISeKey Unveils Space-Based Quantum-Resistant Crypto Transactions at Davos 2026

WISeKey Unveils Space-Based Quantum-Resistant Crypto Transactions at Davos 2026

January 21, 2026