Multiverse Computing, a specialist in artificial intelligence model compression, has launched its CompactifAI Application Programming Interface (API) on Amazon Web Services (AWS). The API provides access to pre-compressed and optimised large language models (LLMs) – including Meta Llama, DeepSeek and Mistral – enabling scalable and cost-effective deployment via AWS Marketplace. This release follows a year of development and collaboration with AWS, facilitated by Multiverse Computing’s participation in the 2024 AWS Generative AI Accelerator Program. An API (Application Programming Interface) is a set of rules and specifications that software programs can follow to communicate with each other.

Development and Strategic Alignment

Over a year of development culminated in the launch of the CompactifAI Application Programming Interface (API) on AWS, informed by Multiverse Computing’s participation in the AWS Generative AI Accelerator Program for 2024. This programme facilitated close collaboration with AWS to optimise top AI models for both performance and cost-efficiency. Strategic development focused on designing a seamless user onboarding experience through AWS Marketplace and establishing a go-to-market strategy to ensure scalability and customer success.

The resulting API provides a serverless layer leveraging AWS SageMaker Hyperpod to scale inference across GPU clusters. Currently, the CompactifAI API offers compressed models including Meta Llama, DeepSeek, and Mistral, providing users with a range of options tailored to specific performance and cost requirements. Multiverse Computing prioritised a user-friendly format, evidenced by a dedicated landing page and Marketplace listing offering model cards detailing specifications, pricing, and performance metrics. The landing page also features comparative tools to aid model selection, comprehensive API documentation, and clearly defined licensing terms.

Early Adoption and Performance Gains

Early adopters demonstrate performance gains through integration of the CompactifAI API. Luzia, an AI assistant provider, reports a greater than 50% reduction in model footprint following implementation, while maintaining response quality and lowering latency and associated costs. Multiverse Computing’s compression techniques achieve up to 95% model size reduction with a reported accuracy loss of only 2-3%.

This compression directly translates to reduced computational demands and infrastructure requirements for users. The availability of Meta Llama, DeepSeek, and Mistral compressed models within the API provides a range of options tailored to varying performance and cost sensitivities.

Scalability and User Resources

The CompactifAI API utilises AWS SageMaker Hyperpod to scale inference across a cluster of GPUs, enabling a serverless large language model (LLM) access layer. Users access the API and models through AWS Marketplace, benefiting from a streamlined onboarding process. Model cards provide detailed specifications, pricing, and performance metrics, facilitating informed selection.

A side-by-side comparison tool assists users in choosing the optimal model for their specific requirements. Comprehensive API documentation and clear licensing terms are also provided. Multiverse Computing offers built-in go-to-market support, leveraging AWS Marketplace to reach customers and drive sales. The company states its compression technology reduces LLM computational requirements by up to 95% with a reported accuracy loss of only 2-3%.

More information
External Link: Click Here For More

Tags:

API AWS Marketplace cloud computing CompactifAI Generative AI inference LLMs Model Compression Multiverse Computing Sagemaker Hyperpod

Quantum News

Latest Posts by Quantum News:

Multiverse Computing Launches Quantum Inspired HyperNova 60B 2602, 50% Compressed LLM, on Hugging Face

AWS Quantum Technologies Blog: New QGCA Outperforms Simulated Annealing on Complex Optimization Problems

AWS Quantum Technologies Releases Qiskit-Braket Provider v0.11, Now Compatible with Qiskit 2.0