MangoBoost has validated the scalability and efficiency of large language model training using 32 AMD Instinct MI300X GPUs across a four-node cluster, achieving a record 10.91-minute fine-tuning of the Llama2-70B-LoRA model in MLPerf Training v5.0 benchmarks. This result, the first multi-node MLPerf submission on AMD GPUs, demonstrates near-linear scaling efficiency of 95.1% and confirms compatibility with Llama2-7B and Llama3-8B models, indicating a viable alternative to traditional vendor-locked GPU platforms for enterprise data centres. The achievement relies on synergy between AMD’s MI300X GPUs and ROCm software, MangoBoost’s LLMBoost AI Enterprise software, and its GPUBoost RoCEv2 network interface card, enabling scalable and cost-efficient AI infrastructure.
MangoBoost Delivers Vendor-Neutral AI with AMD GPUs, Achieves Near-Linear Scaling
MangoBoost’s platform showcases adaptability by supporting multiple model sizes, including Llama2-7B and Llama3-8B, broadening its applicability across diverse large language models. This versatility minimises the risks associated with infrastructure investment, enabling organisations to deploy a single, scalable platform for multiple AI initiatives. By delivering predictable scaling with increased computational resources, MangoBoost mitigates the performance degradation commonly observed in distributed training environments.
This MLPerf Training v5.0 submission establishes a functional alternative to vendor-locked systems, demonstrating the potential for enterprises to build and deploy large language models with greater flexibility and control. MangoBoost’s commitment to vendor-neutral AI empowers organisations to avoid lock-in and leverage the best available hardware and software components. The validation extends beyond a single model size, with internal benchmarks confirming compatibility with Llama2-7B and Llama3-8B, suggesting broad applicability to diverse large language models.
The system expertly integrates hardware and software components, combining MangoBoost’s LLMBoost AI Enterprise software with a GPUBoost RoCEv2 NIC and AMD’s MI300X GPUs within the ROCm software ecosystem. This cohesive approach facilitates efficient communication and resource allocation, proving critical for sustaining high performance in demanding distributed AI workloads. Ongoing development concentrates on refining communication optimisation, hybrid parallelism, and topology-aware scheduling algorithms to further enhance performance and maximise hardware utilisation.
MangoBoost announces a significant advancement in large language model training, demonstrating exceptional performance and scalability with AMD Instinct MI300X GPUs. This system consistently demonstrates predictable scaling, achieving 95% near-linear efficiency across a four-node, 32-GPU cluster during Llama2-70B-LoRA fine-tuning, effectively addressing performance degradation in distributed training. The company validated its system through the MLPerf Training benchmark, achieving 95% near-linear scaling efficiency across a four-node, 32-GPU cluster during Llama2-70B-LoRA fine-tuning.
More information
External Link: Click Here For More
