Multiverse Computing Launches AI Model Compression API on AWS Marketplace

Multiverse Computing, a specialist in artificial intelligence model compression, has launched its CompactifAI Application Programming Interface (API) on Amazon Web Services (AWS). The API provides access to pre-compressed and optimised large language models (LLMs) – including Meta Llama, DeepSeek and Mistral – enabling scalable and cost-effective deployment via AWS Marketplace. This release follows a year of development and collaboration with AWS, facilitated by Multiverse Computing’s participation in the 2024 AWS Generative AI Accelerator Program. An API (Application Programming Interface) is a set of rules and specifications that software programs can follow to communicate with each other.

Development and Strategic Alignment

Over a year of development culminated in the launch of the CompactifAI Application Programming Interface (API) on AWS, informed by Multiverse Computing’s participation in the AWS Generative AI Accelerator Program for 2024. This programme facilitated close collaboration with AWS to optimise top AI models for both performance and cost-efficiency. Strategic development focused on designing a seamless user onboarding experience through AWS Marketplace and establishing a go-to-market strategy to ensure scalability and customer success.

The resulting API provides a serverless layer leveraging AWS SageMaker Hyperpod to scale inference across GPU clusters. Currently, the CompactifAI API offers compressed models including Meta Llama, DeepSeek, and Mistral, providing users with a range of options tailored to specific performance and cost requirements. Multiverse Computing prioritised a user-friendly format, evidenced by a dedicated landing page and Marketplace listing offering model cards detailing specifications, pricing, and performance metrics. The landing page also features comparative tools to aid model selection, comprehensive API documentation, and clearly defined licensing terms.

Early Adoption and Performance Gains

Early adopters demonstrate performance gains through integration of the CompactifAI API. Luzia, an AI assistant provider, reports a greater than 50% reduction in model footprint following implementation, while maintaining response quality and lowering latency and associated costs. Multiverse Computing’s compression techniques achieve up to 95% model size reduction with a reported accuracy loss of only 2-3%.

This compression directly translates to reduced computational demands and infrastructure requirements for users. The availability of Meta Llama, DeepSeek, and Mistral compressed models within the API provides a range of options tailored to varying performance and cost sensitivities.

Scalability and User Resources

The CompactifAI API utilises AWS SageMaker Hyperpod to scale inference across a cluster of GPUs, enabling a serverless large language model (LLM) access layer. Users access the API and models through AWS Marketplace, benefiting from a streamlined onboarding process. Model cards provide detailed specifications, pricing, and performance metrics, facilitating informed selection.

A side-by-side comparison tool assists users in choosing the optimal model for their specific requirements. Comprehensive API documentation and clear licensing terms are also provided. Multiverse Computing offers built-in go-to-market support, leveraging AWS Marketplace to reach customers and drive sales. The company states its compression technology reduces LLM computational requirements by up to 95% with a reported accuracy loss of only 2-3%.

More information
External Link: Click Here For More

Dr. Donovan

Dr. Donovan

Dr. Donovan is a futurist and technology writer covering the quantum revolution. Where classical computers manipulate bits that are either on or off, quantum machines exploit superposition and entanglement to process information in ways that classical physics cannot. Dr. Donovan tracks the full quantum landscape: fault-tolerant computing, photonic and superconducting architectures, post-quantum cryptography, and the geopolitical race between nations and corporations to achieve quantum advantage. The decisions being made now, in research labs and government offices around the world, will determine who controls the most powerful computers ever built.

Latest Posts by Dr. Donovan:

The mind and consciousness explored through cognitive science

Two Clicks Enough for Expert Echolocators to Sense Objects

April 8, 2026
Bloomberg: 21 Factored: Quantum Risk to Crypto Not Imminent Now

Adam Back Says Quantum Risk to Crypto Not Imminent Now

April 8, 2026
Fully programmable quantum computing with trapped-ions

Fully programmable quantum computing with trapped-ions

April 8, 2026