NVIDIA has announced that its RAPIDS cuDF software can now accelerate the processing speed of the popular data analysis library, pandas, without requiring users to change their code. This update is aimed at helping data scientists who use pandas, which can struggle with processing speed as dataset sizes grow. RAPIDS is an open-source suite of GPU-accelerated Python libraries designed to improve data science and analytics pipelines. The latest release of RAPIDS v23.10 brings accelerated computing to pandas workflows through a unified CPU/GPU user experience. This feature is currently available in open beta and will be supported in NVIDIA AI Enterprise soon.
NVIDIA’s RAPIDS cuDF Enhances Pandas Users’ Experience
NVIDIA has announced that its RAPIDS cuDF can now bring GPU acceleration to 9.5 million pandas users without requiring them to change their code. Pandas, a flexible and powerful data analysis and manipulation library for Python, is a top choice for data scientists due to its easy-to-use API. However, as dataset sizes grow, it struggles with processing speed and efficiency in CPU-only systems.
RAPIDS is an open-source suite of GPU-accelerated Python libraries designed to improve data science and analytics pipelines. RAPIDS cuDF is a GPU DataFrame library that provides a pandas-like API for loading, filtering, and manipulating data. In earlier releases of cuDF, it was meant for GPU-only development workflows.
“This feature was built for data scientists who want to continue using pandas as data sizes grow into the gigabytes and pandas performance slows. In cuDF’s pandas accelerator mode, operations execute on the GPU where possible and on the CPU (using pandas) otherwise, synchronizing under the hood as needed. This enables a unified CPU/GPU experience that brings best-in-class performance to your pandas workflows.”
NVIDIA
The latest release of RAPIDS v23.10, cuDF now brings accelerated computing to pandas workflows with no code changes through a unified CPU/GPU user experience with its new pandas accelerator mode. It’s available today in the open-source RAPIDS v23.10 release as an open beta and will be supported in NVIDIA AI Enterprise soon.
Unified CPU/GPU Experience for Pandas Workflows
cuDF has always provided users with top DataFrame library performance using a pandas-like API. However, adopting cuDF has sometimes required workarounds such as working around any pandas functionality not yet implemented or supported in cuDF, designing separate code paths for CPU and GPU execution in codebases that require running on heterogeneous hardware, and manually switching between cuDF and pandas when interacting with other PyData libraries or organization-specific tooling designed for pandas.
Starting with the RAPIDS v23.10 release, cuDF now provides a pandas accelerator mode to address these challenges, in addition to the existing GPU-only experience. This feature was built for data scientists who want to continue using pandas as data sizes grow into the gigabytes and pandas performance slows.
Features of the Latest cuDF Release
With the latest release, cuDF now provides the following features: zero code change acceleration, third-party library compatibility, and unified CPU/GPU workflows. To bring GPU acceleration into your pandas workflows in a Jupyter notebook, you can load the cudf.pandas extension. To access it when running Python scripts, you can use the cudf.pandas module option.
Improved Performance for Pandas Workflows
As data sizes scale into the gigabytes, using pandas often becomes challenging due to slower performance. With the new RAPIDS cuDF, you can keep using pandas as your primary tool and access the highest performance. You can run the pandas benchmark code unchanged and achieve significant speedups, using the GPU for most of the operations and the CPU for a small portion to ensure that the workflow succeeds.
The results are excellent. The cuDF unified CPU/GPU experience turns minutes of processing into just 1 or 2 seconds with no code change required. For more information about these benchmark results and how to reproduce them, see the cuDF documentation.
Summary
Pandas is the most popular DataFrame library in the Python ecosystem, but it slows down as data sizes grow on CPUs. With cuDF’s pandas accelerator mode now available in open beta as part of the RAPIDS v23.10 release, you can now bring accelerated computing to your pandas workflows without needing to change your code. Based on an analytics benchmark processing a 5 GB dataset, you can achieve 150x faster processing times.
“With cuDF’s pandas accelerator mode now available in open beta as part of the RAPIDS v23.10 release, you can now bring accelerated computing to your pandas workflows without needing to change your code. Based on an analytics benchmark processing a 5 GB dataset, you can achieve 150x faster processing times.”
NVIDIA
Summary
RAPIDS cuDF, an open-source suite of GPU-accelerated Python libraries, has been updated to improve data processing speed and efficiency for pandas users, a popular data analysis and manipulation library for Python. The latest release offers a unified CPU/GPU user experience. It allows data scientists to continue using pandas for larger datasets without changing their code, potentially achieving up to 150 times faster processing times.
- NVIDIA has announced that its RAPIDS cuDF software can now bring GPU acceleration to 9.5 million users of the pandas data analysis library for Python, without requiring any changes to their code.
- The pandas library is popular among data scientists due to its easy-to-use API, but it struggles with processing speed and efficiency in CPU-only systems as dataset sizes grow.
- RAPIDS is an open-source suite of GPU-accelerated Python libraries designed to improve data science and analytics pipelines. RAPIDS cuDF is a GPU DataFrame library that provides a pandas-like API for loading, filtering, and manipulating data.
- The latest release of RAPIDS v23.10 now brings accelerated computing to pandas workflows with no code changes through a unified CPU/GPU user experience with its new pandas accelerator mode.
- This feature was built for data scientists who want to continue using pandas as data sizes grow into the gigabytes and pandas performance slows. In cuDF’s pandas accelerator mode, operations execute on the GPU where possible and on the CPU (using pandas) otherwise, synchronizing under the hood as needed.
- With the latest release, cuDF now provides zero code change acceleration, third-party library compatibility, and unified CPU/GPU workflows.
- Based on an analytics benchmark processing a 5 GB dataset, you can achieve 150x faster processing times with cuDF’s pandas accelerator mode.
