Zyphra has developed ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained entirely on AMD Instinct™ MI300X GPUs, AMD Pensando™ networking, and the AMD ROCm™ open software platform. Utilizing 192 GB of high-bandwidth memory within the AMD Instinct MI300X, Zyphra simplified training capabilities and achieved 10x faster model save times. The resulting ZAYA1-base model, with 8.3B total and 760M active parameters, demonstrates competitive or superior performance to leading open models—including Llama-3-8B, OLMoE, Qwen3-4B, and Gemma3-12B—across reasoning, mathematics, and coding benchmarks.
ZAYA1 Model Performance and Benchmarks
Zyphra’s ZAYA1 model is a large-scale Mixture-of-Experts (MoE) foundation model—the first trained entirely on AMD Instinct™ MI300X GPUs, AMD Pensando™ networking, and ROCm software. Benchmarking conducted by Zyphra on November 14, 2025, demonstrates that ZAYA1-base (8.3B total, 760M active parameters) outperforms Llama-3-8B and OLMoE. It also achieves comparable performance to Qwen3-4B and Gemma3-12B, showcasing AMD’s scalability and efficiency for production-scale AI workloads.
The AMD Instinct MI300X GPU’s 192 GB of high-bandwidth memory was critical to ZAYA1’s training. This capacity enabled efficient large-scale training by avoiding costly expert or tensor sharding, simplifying the process and improving throughput. Zyphra also reported over 10x faster model save times with AMD optimized distributed I/O, enhancing training reliability and efficiency.
Zyphra measured aggregate throughput of training iterations across its cluster—reaching quadrillions of floating point operations per second (PFLOPs). This testing, completed November 14, 2025, utilized a cluster of 128 compute nodes, each with 8 AMD Instinct™ MI300X GPUs and Pensando™ Pollara 400 Interconnects, running Zyphra’s proprietary training stack. These results highlight the power of co-designing model architectures with silicon and systems.
AMD Instinct GPU and Networking Platform
AMD has achieved a milestone in AI model training with Zyphra’s development of ZAYA1, the first large-scale Mixture-of-Experts (MoE) foundation model trained on an AMD GPU and networking platform. This was accomplished using AMD Instinct™ MI300X GPUs, AMD Pensando™ networking, and the AMD ROCm™ open software stack. Zyphra’s technical report details the results, showing ZAYA1-base outperforms models like Llama-3-8B and OLMoE, and rivals Qwen3-4B and Gemma3-12B across multiple benchmarks.
The AMD Instinct MI300X GPU’s 192 GB of high-bandwidth memory enabled efficient large-scale training for ZAYA1 by avoiding costly expert or tensor sharding. Zyphra also reported achieving over 10x faster model save times utilizing AMD optimized distributed I/O. Despite having a smaller number of active parameters—8.3B total, with 760M active—ZAYA1-Base matches or exceeds the performance of larger models like Qwen3-4B and Gemma3-12B.
This achievement was supported by a jointly engineered system from AMD and IBM, combining AMD Instinct™ MI300X GPUs with IBM Cloud’s high-performance fabric and storage. Testing by Zyphra, as of November 14, 2025, measured the aggregate throughput of training iterations across their cluster at quadrillion floating point operations per second (PFLOPs) using a proprietary training stack and (128) compute nodes each containing (8) AMD Instinct™ MI300X GPUs.
Efficient Training with AMD Technologies
AMD technologies are enabling efficient large-scale AI model training, as demonstrated by Zyphra’s development of ZAYA1. The model was trained entirely on AMD Instinct™ MI300X GPUs, utilizing AMD Pensando™ networking and the AMD ROCm™ open software stack. This achievement simplifies training capabilities thanks to the 192 GB of high-bandwidth memory within the MI300X, avoiding costly expert or tensor sharding and improving throughput.
ZAYA1-Base (8.3B total, 760M active parameters) achieves performance matching or exceeding models like Qwen3-4B, Gemma3-12B, Llama-3-8B, and OLMoE. Zyphra also reported over 10x faster model save times with AMD optimized distributed I/O. These improvements were realized through a jointly engineered system from AMD and IBM, combining MI300X GPUs with IBM Cloud’s high-performance fabric and storage architecture.
Testing conducted by Zyphra on November 14, 2025, measured aggregate throughput of training iterations across its full cluster. The workload, training a model in BFLOAT16 across 128 compute nodes each with 8 MI300X GPUs and Pensando™ Pollara 400 Interconnects, highlights the power of co-designing model architectures with silicon and systems to deliver frontier intelligence.
AMD leadership in accelerated computing is empowering innovators like Zyphra to push the boundaries of what’s possible in AI.
Emad Barsoum, corporate vice president of AI and engineering, Artificial Intelligence Group, AMD
