Multiverse Computing has launched Pulsar 16B, a new open reasoning model achieving performance comparable to 30 billion parameter models with a compressed 16.15 billion parameter count. The model utilizes only 3.1 billion active parameters, demonstrating substantial efficiency gains while maintaining high performance, and was validated on NVIDIA accelerated computing infrastructure. According to the company, Pulsar 16B outperforms the gpt-oss-20B model on nearly every benchmark despite its smaller size. With 32 concurrent requests on an NVIDIA Blackwell GPU, Pulsar 16B delivers 4,808 tokens per second system throughput, a 43% increase over its base model, and reduces time-to-first-token from 2.18 seconds to 1.24 seconds.
Pulsar 16B: Nemotron Architecture and CompactifAI Compression
Pulsar 16B achieves performance similar to 30 billion parameter models with a remarkably compressed architecture, utilizing only 3.1 billion active parameters despite a total of 16.15 billion. This feat is enabled by Multiverse Computing’s CompactifAI technology and NVIDIA’s Nemotron framework. Built upon NVIDIA’s Nemotron 3 Nano, a Hybrid Mamba2-Transformer with Mixture-of-Experts, the model underwent a compression process leveraging NVIDIA Model Optimizer and Megatron Bridge libraries, allowing for substantial reductions in model weight memory.
This is particularly advantageous for deployment on systems with limited GPU memory or single-node environments, where larger models would be impractical. Time-to-first-token was reduced from 2.18 seconds to 1.24 seconds, indicating a faster response time for users. Enrique Lizaso, cofounder and CEO of Multiverse Computing, said, “Running advanced AI locally has historically required compromising on model size or performance.” He continued, “What we’re demonstrating with Pulsar 16B is that frontier-grade reasoning can now be deployed without the overhead of cloud-scale infrastructure, at a footprint enterprises can actually run and scale economically.” The company specifically focused on preserving reasoning quality, instruction-following behavior, and tool-use interfaces during the compression process, ensuring that the reduced model size did not come at the expense of functionality. Long-context performance also remained largely intact, with needle-in-a-haystack retrieval remaining perfect even at the 100,000 token mark, suggesting the compression methodology avoids typical degradation in such scenarios.
Running advanced AI locally has historically required compromising on model size or performance.
Enrique Lizaso, cofounder and CEO of Multiverse Computing
NVIDIA Blackwell GPU Achieves 43% Throughput Increase with Pulsar 16B
The demand for increasingly capable artificial intelligence models continues to push the boundaries of computational resources, yet practical deployment often necessitates compromise between model size and performance. Recent advancements, however, suggest a pathway toward retaining high-level reasoning capabilities within a significantly reduced parameter count. Multiverse Computing, in collaboration with NVIDIA, has released Pulsar 16B, a 16.15 billion parameter model designed to deliver performance comparable to larger 30 billion parameter architectures. This achievement is particularly notable given that Pulsar 16B operates with only 3.1 billion parameters. Independent validation on NVIDIA accelerated computing infrastructure reveals a considerable boost in throughput when paired with NVIDIA Blackwell GPUs. Specifically, the Pulsar 16B model achieves 4,808 tokens per second with 32 concurrent requests, representing a 43% increase over the 3,363 tokens per second delivered by the uncompressed base model.
This performance gain is coupled with a reduction in time-to-first-token, decreasing from 2.18 seconds to 1.24 seconds. Beyond speed, Pulsar 16B maintains strong performance across a range of benchmarks; on the AIME benchmark, it scores within a tenth of a point of its uncompressed counterpart and outperforms the gpt-oss-20B model by 15 points. The model also demonstrates competitive results on GPQA-Diamond, a PhD-level science question answering task, and shows improvements in instruction following, function calling, and math reasoning. The model is now available on Hugging Face under the Apache 2.0 license, with detailed technical documentation accessible at multiversecomputing.com.
With 32 concurrent requests on an NVIDIA Blackwell GPU, Pulsar 16B (FP8) delivers 4,808 tokens/second system throughput, a 43% increase over the base model’s 3,363 tok/s, while reducing time-to-first-token (TTFT) from 2.18s to 1.24s.
Source: https://multiversecomputing.com/
