Hewlett Packard Labs: Convergence of High-Performance Computing and AI Workflows

Hewlett Packard Labs Leads Convergence Of High-Performance Computing And Ai Workflows

A team from Hewlett Packard Labs is leading a paradigm shift in the convergence of traditional high-performance computing (HPC) and modern artificial intelligence (AI) computing, expressed as workflows. These workflows simplify development, reuse, and operation, and are increasingly dependent on accelerators and serverless computing. The team predicts nine principles for this convergence, including the adoption of workflows as native applications for HPC and AI, and the use of Multilevel Intermediate Representation (MLIR) to address heterogeneity complexity. The future of HPC is seen as evolving towards harmoniously engineered workflows, complemented by machine learning and AI.

Convergence of High-Performance Computing and Artificial Intelligence

The convergence of traditional high-performance computing (HPC) and modern artificial intelligence (AI) computing is increasingly being expressed as workflows. This paradigm shift is being led by a team from Hewlett Packard Labs, including Pedro Bruel, Sai Rahul Chalamalasetti, Aditya Dhakal, Eitan Frachtenberg, Ninad Hogade, Rolando Pablo Hong Enriquez, Alok Mishra, Dejan Milojicic, Pavana Prakash, and Gourav Rattihalli.

Workflows provide a higher level of abstraction that simplifies development, reuse, and operation. Both HPC and AI heavily depend on accelerators and are adopting serverless computing, which also raises the level of abstraction and simplifies DevOps. This convergence is analyzed from three perspectives: end users, developers, and service providers.

End users are concerned with latency or throughput of workflows at scale and ease of use. Developers focus on ease of development, constructing workflows from existing workloads, and making quality-of-service guarantees. Service providers primarily care about meeting service-level agreements for user quality of service and maximizing infrastructure utilization.

Principles of Converged High-Performance Computing, AI, and Workflows

The team from Hewlett Packard Labs predicts nine principles of heterogeneity and serverless computing for this convergence. These principles aim to enable seamless scalability and fluidity for end users, increased productivity for developers, and improved performance efficiency for providers.

The first principle is that workflows will become native applications for HPC and AI. As computational problems grow in complexity, they are typically broken into smaller tasks. However, the end of the Dennard-scaling era is leading to accelerator diversification, necessitating the distribution of computing tasks among emerging accelerators, dynamic redeployment at scale, and the generation and testing of workflows interactively using low-code programming models and simulations.

Multilevel Intermediate Representation and Heterogeneity Complexity

The second principle is that Multilevel Intermediate Representation (MLIR) will address heterogeneity complexity. As applications become more workflow-centric, it becomes increasingly challenging for programming languages and compilers to ensure efficient task execution, unified representation, interoperability, and good abstraction.

MLIR is an open-source project initiated by LLVM to develop a new intermediate representation for compilers. It provides a more expressive, flexible, and efficient way to represent program structures in the front end, enabling efficient compilation, optimization, and interoperability across diverse programming languages, domains, and hardware platforms.

Future of High-Performance Computing and AI

The future of HPC is evolving towards harmoniously engineered workflows whose complexity can be abstracted while still performing optimally under flexible conditions. The need for human experts to achieve next-level performance on these increasingly complex computational workflows will be complemented by machine learning and AI.

These techniques will not only be used within the workflows themselves but also as part of the HPC infrastructure. Optimally deploying these heterogeneous codes will require attention to how these codes are represented, as discussed in the principle of MLIR.

Evolution of Workflows

The evolution of workflows has moved from scripting and object-oriented programming (OOP) in integrated development environments (IDEs) and libraries to workflow managers. The future lies in native HPC workflows that seamlessly handle heterogeneity and dynamic, scalable deployments.

As the complexity of HPC increases, the emergence of big data and the shift from CPU-based HPC clusters to mostly CPU-GPU on-premises, cloud, and edge computing necessitates the evolution of workflows. Large teams and communities are now tackling multi-domain problems, leading to the development of native HPC workflows.

The article titled “Predicting Heterogeneity and Serverless Principles of Converged High-Performance Computing, Artificial Intelligence, and Workflows” was published on January 1, 2024, in the IEEE Computer journal. The authors of this article are Pedro Bruel, Sai Rahul Chalamalasetti, Aditya Dhakal, Eitan Frachtenberg, Ninad Hogade, Rolando Pablo Hong Enriquez, Avnish Kumar Mishra, Dejan Milojicic, Pavana Prakash, and Gourav Rattihalli. The article can be accessed through the DOI link https://doi.org/10.1109/mc.2023.3332973.