Artificial Intelligence Training Speed up Using Standard Internet Connections

Researchers are tackling a significant challenge in large language model (LLM) post-training using reinforcement learning (RL), namely the substantial infrastructure costs associated with synchronising model parameters across distributed systems. Chaoyi Ruan, Geng Luo and Xinyi Wan, from the National University of Singapore (NUS), alongside Long Zhao and Qinghe Wang from Anhui University, and Jiaan Zhu from the University of Science and Technology of China (USTC), and colleagues have developed SparrowRL, a novel system designed to overcome bandwidth limitations when utilising standard Ethernet and wide area networks. This work is significant because it demonstrates a pathway to performing high-throughput RL without relying on expensive, dedicated remote direct memory access (RDMA) high-performance computing clusters, instead leveraging loosely-coupled GPUs and achieving throughput comparable to RDMA systems. By representing updates as lossless sparse deltas and employing techniques such as pipelined extraction and throughput-aware scheduling, SparrowRL reduces transfer payloads by up to 79% and improves throughput by 2.4, 9.5x on Qwen3 models, ultimately delivering a more cost-effective approach to LLM refinement, with 1.21, 1.59times more tokens per dollar than reserved RDMA clusters.

This addresses a critical limitation in the field, where current high-throughput reinforcement learning relies on expensive, dedicated high-performance computing clusters with specialised networking hardware. The system exploits the sparse nature of parameter updates required to refine LLMs, with approximately one percent of the model’s parameters changing with each training step. SparrowRL represents each update as a ‘sparse delta checkpoint’, a compact record of only the altered parameters, and transmits these changes efficiently across networks. This approach, combined with a pipelined transfer protocol and intelligent scheduling, significantly reduces data transfer volumes and accelerates the training process. Evaluations using Qwen3 models ranging from 4 billion to 14 billion parameters, deployed across up to four geographically diverse regions, demonstrate that SparrowRL reduces per-step transfer payload by a factor of 79 for the 8 billion parameter model. Throughput improves by a factor of 2.4 to 9.5 compared to traditional full-weight broadcasting over wide area networks, bringing performance closer to that of an ideal, dedicated research-grade cluster. By leveraging on-demand GPUs and standard network links, SparrowRL achieves 1.21 to 1.59times higher tokens per dollar compared to using reserved, high-bandwidth clusters with comparable throughput. The system integrates seamlessly with existing high-throughput training and inference engines like FSDP2 and vLLM, requiring no modifications to the underlying reinforcement learning algorithms. Across Qwen3 models ranging from 4B to 14B parameters, SparrowRL achieves a 79% reduction in per-step transfer payload. This substantial decrease is particularly evident with the Qwen3-8B model, where efficient encoding techniques minimise data transfer volume. Furthermore, SparrowRL delivers a throughput improvement of 2.4 to 9.5times over full-weight broadcast methods when operating across wide area networks. The system’s design incorporates lossless sparse delta geo-checkpoints, unifying checkpoint storage and network transfer into a single abstraction. This ensures version control is embedded directly into the transport layer, simplifying the distributed protocol and supporting verifiable acceptance predicates. Index encoding within these checkpoints utilises delta encoding and variable-length unsigned LEB128 integers, reducing the index footprint by 30, 50% and minimising metadata overhead. The encoding scheme allows most index differences to be represented within a single byte, while accommodating larger gaps with multi-byte variable-length encoding. SparrowRL initiates its methodology with a focus on sparse delta checkpoints, recognising that reinforcement learning fine-tuning generates highly selective parameter updates, typically around 1% of elements change per step. Rather than transmitting full model weights, the system captures only these differences, dramatically reducing the data volume needing transfer across networks. This preserves bit-exact updates, avoiding information loss from quantisation or dropping parameters, crucial for maintaining model accuracy during post-training. To maximise efficiency, SparrowRL pipelines the extraction of these sparse deltas with multi-stream transmission, overlapping computation and communication. Simultaneously, rollout generation proceeds in parallel with data transfer, further minimising idle time. The system further incorporates throughput- and bandwidth-aware scheduling, intelligently allocating resources to heterogeneous workers. A lease-based coordination mechanism ensures that workers efficiently claim and utilise available bandwidth, preventing congestion and maximising overall throughput. Deployments across up to four geographic regions were tested using Qwen3 models ranging from 4B to 14B parameters, leveraging on-demand H100 and A100 GPUs connected via standard Ethernet and WAN links. The relentless pursuit of ever-larger language models has exposed a critical bottleneck: the infrastructure needed to refine them. SparrowRL represents a significant step towards democratising this process, demonstrating that effective fine-tuning isn’t solely the domain of those with access to cutting-edge hardware. This work isn’t simply about faster transfer speeds; it’s about fundamentally altering the economics of LLM development. By recognising the inherent sparsity of updates during reinforcement learning, SparrowRL cleverly focuses on transmitting only the differences, dramatically reducing the data load. This allows researchers to leverage commodity networks and even geographically distributed GPUs, opening up possibilities for collaborative training and reducing reliance on single, costly providers. While the throughput gap to ideal RDMA connections is narrowed, it isn’t eliminated, suggesting that network latency remains a factor. Furthermore, the benefits of sparse updates may diminish as models grow even larger or as the nature of the fine-tuning task changes. The next frontier will likely involve even more sophisticated compression techniques, perhaps combining SparrowRL’s delta-based approach with quantisation or other forms of parameter reduction, alongside intelligent scheduling algorithms that dynamically adapt to network conditions and worker heterogeneity. Ultimately, the goal is to build a truly elastic and accessible reinforcement learning ecosystem, where innovation isn’t constrained by infrastructure costs.

👉 More information
🗞 RL over Commodity Networks: Overcoming the Bandwidth Barrier with Lossless Sparse Deltas
🧠 ArXiv: https://arxiv.org/abs/2602.11456

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

New Technique Thoroughly Tests Quantum Programs Without Detailed Blueprints

New Technique Thoroughly Tests Quantum Programs Without Detailed Blueprints

February 16, 2026
Quantum Systems Analysed with Distributed Setups Reveal State Properties from Data

Quantum Systems Analysed with Distributed Setups Reveal State Properties from Data

February 16, 2026
Atomic Sensors Gain Accuracy with New Signal Processing Technique

Atomic Sensors Gain Accuracy with New Signal Processing Technique

February 16, 2026