Multiverse Computing Launches Pulsar 16B Squeezing Parameters By Half

Multiverse Computing has launched Pulsar 16B, a new open reasoning model achieving performance comparable to 30 billion parameter models with a compressed 16.15 billion parameter count. The model utilizes only 3.1 billion active parameters, demonstrating substantial efficiency gains while maintaining high performance, and was validated on NVIDIA accelerated computing infrastructure. According to the company, Pulsar 16B outperforms the gpt-oss-20B model on nearly every benchmark despite its smaller size. With 32 concurrent requests on an NVIDIA Blackwell GPU, Pulsar 16B delivers 4,808 tokens per second system throughput, a 43% increase over its base model, and reduces time-to-first-token from 2.18 seconds to 1.24 seconds.

Pulsar 16B: Nemotron Architecture and CompactifAI Compression

Pulsar 16B achieves performance similar to 30 billion parameter models with a remarkably compressed architecture, utilizing only 3.1 billion active parameters despite a total of 16.15 billion. This feat is enabled by Multiverse Computing’s CompactifAI technology and NVIDIA’s Nemotron framework. Built upon NVIDIA’s Nemotron 3 Nano, a Hybrid Mamba2-Transformer with Mixture-of-Experts, the model underwent a compression process leveraging NVIDIA Model Optimizer and Megatron Bridge libraries, allowing for substantial reductions in model weight memory.

This is particularly advantageous for deployment on systems with limited GPU memory or single-node environments, where larger models would be impractical. Time-to-first-token was reduced from 2.18 seconds to 1.24 seconds, indicating a faster response time for users. Enrique Lizaso, cofounder and CEO of Multiverse Computing, said, “Running advanced AI locally has historically required compromising on model size or performance.” He continued, “What we’re demonstrating with Pulsar 16B is that frontier-grade reasoning can now be deployed without the overhead of cloud-scale infrastructure, at a footprint enterprises can actually run and scale economically.” The company specifically focused on preserving reasoning quality, instruction-following behavior, and tool-use interfaces during the compression process, ensuring that the reduced model size did not come at the expense of functionality. Long-context performance also remained largely intact, with needle-in-a-haystack retrieval remaining perfect even at the 100,000 token mark, suggesting the compression methodology avoids typical degradation in such scenarios.

Running advanced AI locally has historically required compromising on model size or performance.

Enrique Lizaso, cofounder and CEO of Multiverse Computing

NVIDIA Blackwell GPU Achieves 43% Throughput Increase with Pulsar 16B

The demand for increasingly capable artificial intelligence models continues to push the boundaries of computational resources, yet practical deployment often necessitates compromise between model size and performance. Recent advancements, however, suggest a pathway toward retaining high-level reasoning capabilities within a significantly reduced parameter count. Multiverse Computing, in collaboration with NVIDIA, has released Pulsar 16B, a 16.15 billion parameter model designed to deliver performance comparable to larger 30 billion parameter architectures. This achievement is particularly notable given that Pulsar 16B operates with only 3.1 billion parameters. Independent validation on NVIDIA accelerated computing infrastructure reveals a considerable boost in throughput when paired with NVIDIA Blackwell GPUs. Specifically, the Pulsar 16B model achieves 4,808 tokens per second with 32 concurrent requests, representing a 43% increase over the 3,363 tokens per second delivered by the uncompressed base model.

This performance gain is coupled with a reduction in time-to-first-token, decreasing from 2.18 seconds to 1.24 seconds. Beyond speed, Pulsar 16B maintains strong performance across a range of benchmarks; on the AIME benchmark, it scores within a tenth of a point of its uncompressed counterpart and outperforms the gpt-oss-20B model by 15 points. The model also demonstrates competitive results on GPQA-Diamond, a PhD-level science question answering task, and shows improvements in instruction following, function calling, and math reasoning. The model is now available on Hugging Face under the Apache 2.0 license, with detailed technical documentation accessible at multiversecomputing.com.

With 32 concurrent requests on an NVIDIA Blackwell GPU, Pulsar 16B (FP8) delivers 4,808 tokens/second system throughput, a 43% increase over the base model’s 3,363 tok/s, while reducing time-to-first-token (TTFT) from 2.18s to 1.24s.

Stay current. See today’s quantum computing news on Quantum Zeitgeist for the latest breakthroughs in qubits, hardware, algorithms, and industry deals.
Avatar of Rusty Flint

Rusty Flint

Rusty is a quantum science nerd. He's been into academic science all his life, but spent his formative years doing less academic things. Now he turns his attention to write about his passion, the quantum realm. He loves all things Quantum Physics especially. Rusty likes the more esoteric side of Quantum Computing and the Quantum world. Everything from Quantum Entanglement to Quantum Physics. Rusty thinks that we are in the 1950s quantum equivalent of the classical computing world. While other quantum journalists focus on IBM's latest chip or which startup just raised $50 million, Rusty's over here writing 3,000-word deep dives on whether quantum entanglement might explain why you sometimes think about someone right before they text you. (Spoiler: it doesn't, but the exploration is fascinating)

Latest Posts by Rusty Flint: