IBM Granite 4.0 Achieves 2x Faster AI Inference Speeds

IBM, as evaluated by Stanford University’s Center for Research on Foundation Models, achieved the highest score of 95% in the 2025 Foundation Model Transparency Index (FMTI), alongside the largest year-over-year improvement of any developer assessed. This recognition stems from the transparency of IBM’s Granite family of open, performant small language models (SLMs), specifically Granite 3.3 during the evaluation period. This level of transparency enables businesses to deploy AI with greater reliability, accuracy, and control by understanding how models are built and governed, reducing risk and maximizing value from existing data and applications. This achievement contrasts with a declining average transparency score across the industry.

IBM’s Leading Score in AI Transparency

IBM achieved the highest score in the 2025 Foundation Model Transparency Index (FMTI), earning 95%. This marks a significant achievement, representing both the highest score in the Index’s three-year history and the largest year-over-year improvement of any developer evaluated. Notably, while most developers saw transparency scores decline, IBM moved in the opposite direction. This level of transparency is critical, enabling businesses to deploy AI with increased reliability, accuracy and control by understanding how models are built and governed.

The evaluation focused on IBM’s Granite 3.3 model, and the company continues to prioritize transparency with the newer Granite 4.0. Granite 4.0 builds on this momentum, delivering performance gains with up to 2x faster inference and 70% lower memory requirements for complex tasks. These models are open source under the Apache 2.0 license, cryptographically signed, and the first open models to receive ISO 42001 certification, demonstrating a commitment to security, governance, and transparency.

IBM’s approach extends beyond the Granite models themselves. Their software portfolio, including watsonx, integrates governance, security, and observability, offering visibility into AI usage and safeguards. This flexibility allows organizations to scale AI safely across any cloud, model, or data environment, working with models like Llama, Mistral, and Claude alongside Granite. The company aims to provide a foundation for confident AI deployment and governance.

Granite Models: Performance and Openness

IBM’s Granite models were key to the company achieving the highest score – 95% – in the 2025 Foundation Model Transparency Index (FMTI). This result represents the largest year-over-year improvement of any developer evaluated, a stark contrast to the industry-wide decline in transparency, which dropped to an average of 41. The evaluation focused on Granite 3.3, and builds on momentum within the open-source ecosystem where Granite models have achieved nearly 20 million downloads in the past year.

Granite 4.0, the next generation of the model family, delivers both high performance and efficiency gains. It features a hybrid architecture enabling up to 2x faster inference and 70% lower memory requirements compared to similar models handling complex tasks. Importantly, Granite models remain open source under the Apache 2.0 license, are ISO 42001 certified, and are cryptographically signed, demonstrating adherence to security and governance best practices.

IBM’s commitment to openness extends beyond the Granite family; the company integrates with models like Llama, Mistral, and Claude. Their watsonx software provides visibility into AI usage, security, and safeguards. This focus on transparency and responsible AI development aims to enable organizations to confidently deploy and govern AI across any environment, building on existing investments rather than proprietary stacks.

Scaling AI with Confidence and Governance

IBM achieved the highest score of 95% in the 2025 Foundation Model Transparency Index (FMTI), a significant improvement and a standout result as the average transparency score across developers dropped to 41. This leadership is driven by models like Granite, IBM’s family of open, performant small language models (SLMs). The company’s commitment to openness enables businesses to deploy AI with greater reliability and control, addressing risks like bias and unpredictable behaviors by offering clarity into model building and governance.

Granite 4.0, the next generation of the model family, delivers advancements in efficiency with up to 2x faster inference and 70% lower memory requirements for complex tasks. These models are open source under Apache 2.0, cryptographically signed, and the first open models to receive ISO 42001 certification – demonstrating adherence to security, governance, and transparency best practices. IBM expands documentation to support responsible model development, continuing the transparency principles recognized in the FMTI.

IBM’s software portfolio is designed to allow organizations to scale AI with confidence across any environment, integrating with models like Llama, Mistral, and Claude. watsonx provides visibility into AI usage, users, and safeguards, embedding governance, security, and observability into solutions. This approach enables proactive risk management and supports businesses in deploying and governing AI effectively, building on the trust established by models like Granite.

The foundational efficiency gains of Granite 4.0 are rooted in its proprietary hybrid architecture. Rather than relying solely on dense, pure transformer layers, the model strategically integrates specialized components, such as mixture-of-experts (MoE) routing and optimized attention mechanisms. This modular design allows the model to dynamically activate only the most relevant parts of the network for a given input token. This drastically reduces the computational footprint during inference, achieving significantly faster processing times while maintaining the contextual depth characteristic of large language models.

Furthermore, focusing on small language models (SLMs) provides a distinct advantage in enterprise deployment. While monolithic foundation models offer broad capability, SLMs offer unparalleled specialization and edge-case efficiency. Their reduced parameter count allows them to be fine-tuned rapidly on proprietary domain data—a process often termed domain adaptation—without requiring massive GPU clusters. This capability lowers the barrier to entry, enabling mission-critical AI tasks to operate reliably on resource-constrained or air-gapped local infrastructure.

The inclusion of ISO 42001 certification highlights a major shift from measuring raw model capability to quantifying verifiable trust. This standard mandates comprehensive lifecycle governance, covering data provenance, model testing, and adversarial attack resistance throughout the entire development pipeline. For regulated industries, this moves the conversation beyond merely achieving high benchmark scores; it establishes an auditable, repeatable framework that directly mitigates systemic operational risk associated with complex AI systems.

IBM’s leadership in the FMTI reinforces our mission to make AI safer, explainable and operable across any cloud, model or data environment.

Dr. Donovan

Dr. Donovan

Dr. Donovan is a futurist and technology writer covering the quantum revolution. Where classical computers manipulate bits that are either on or off, quantum machines exploit superposition and entanglement to process information in ways that classical physics cannot. Dr. Donovan tracks the full quantum landscape: fault-tolerant computing, photonic and superconducting architectures, post-quantum cryptography, and the geopolitical race between nations and corporations to achieve quantum advantage. The decisions being made now, in research labs and government offices around the world, will determine who controls the most powerful computers ever built.

Latest Posts by Dr. Donovan:

SuperQ’s SuperPQC Platform Gains Global Visibility Through QSECDEF

SuperQ’s SuperPQC Platform Gains Global Visibility Through QSECDEF

April 11, 2026
Database Reordering Cuts Quantum Search Circuit Complexity

Database Reordering Cuts Quantum Search Circuit Complexity

April 11, 2026
SPINS Project Aims for Millions of Stable Semiconductor Qubits

SPINS Project Aims for Millions of Stable Semiconductor Qubits

April 10, 2026