Repository Planning Graph Enables Scalable Codebase Generation, Achieving Coherent Planning for Complete Repositories

Generating complete software repositories from scratch presents a significant hurdle in automated code creation, despite recent advances in function and file-level code generation. Jane Luo, Xin Zhang, and Steven Liu, along with their colleagues, now demonstrate a new approach that tackles this challenge by introducing the Repository Planning Graph, or RPG. This innovative system unifies the often-disparate stages of software planning and implementation into a single, coherent graph representing capabilities, file structures, and data flows. By replacing ambiguous natural language instructions with this explicit blueprint, the team’s framework, ZeroRepo, achieves a substantial leap in repository generation, producing codebases averaging nearly 36,000 lines, significantly exceeding the performance of leading baseline models and attaining impressive functional coverage and test pass rates on a benchmark of real-world projects. This work not only accelerates automated code creation but also enhances the ability of large language models to understand and navigate complex software structures, paving the way for more sophisticated and scalable software development tools.

RPG Guides Code Replication and Innovation

Scientists have developed a novel method for guiding code generation and evolution using a Reproducible Program Graph (RPG), a structured representation of a software repository. This innovative approach allows an AI agent to systematically explore, replicate, and introduce new features within a codebase, effectively reshaping the agent’s behavior for a more structured and efficient search process. The system supports both code replication and innovation, offering a promising approach to automated code generation and evolution. The RPG serves as a blueprint for code generation, guiding the agent through the codebase and enabling it to identify relevant components and dependencies. Experiments reveal that agents guided by the RPG achieve high coverage, introduce meaningful novelty, and generate code with reasonable quality, including features like Prophet forecasting and STL decomposition in MLKit-Py.

Repository Planning with Unified Graph Representation

Scientists have created ZeroRepo, a groundbreaking framework for generating complete software repositories from scratch, addressing a fundamental challenge in automated code generation. The core of this work is the Repository Planning Graph (RPG), a persistent representation that unifies high-level planning and detailed implementation by encoding capabilities, file structures, data flows, and functions within a single graph. This approach replaces ambiguous natural language instructions with an explicit blueprint, enabling long-horizon planning and scalable repository generation. ZeroRepo operates in three distinct stages, beginning with RPG construction, followed by code generation through graph traversal, and concluding with test-driven development to ensure incremental expansion and stability.

To rigorously evaluate ZeroRepo, researchers constructed RepoCraft, a benchmark comprising six real-world Python projects with a total of 1,052 tasks, demonstrating that ZeroRepo produces repositories averaging nearly 36,000 lines of code, significantly exceeding the output of existing systems. The research team also implemented graph-guided localization and editing tools to facilitate implementation and debugging requests, retrieving and allowing modification of associated code, and validating changes through a staged workflow aligned with the graph structure. A lightweight majority-vote diagnosis distinguishes genuine implementation errors from environmental issues, automatically handling the latter and returning the former for repair.

ZeroRepo Generates Complete Software Repositories Automatically

Scientists have achieved a breakthrough in automated repository generation, creating ZeroRepo, a system that significantly outperforms existing approaches. ZeroRepo leverages a novel framework called the Repository Planning Graph (RPG) to meticulously plan and construct software repositories from scratch, moving beyond simple code generation to encompass complete project structure and functionality. Experiments on the RepoCraft benchmark, comprising six real-world projects with 1,052 tasks, demonstrate ZeroRepo’s remarkable capabilities. The team measured repository size in lines of code and tokens, revealing that ZeroRepo, when paired with the Qwen3-Coder model, generates repositories averaging 36,941 lines of code and 445,511 tokens, a substantial increase compared to existing systems.

Beyond sheer size, ZeroRepo achieves 81. 5% functional coverage and a 69. 7% pass rate, exceeding the strongest baseline by a considerable margin. Furthermore, ZeroRepo demonstrates a capacity for innovation, achieving novelty rates of 13. 6%, corresponding to the creation of over 151 new functionalities. Detailed analysis of repositories generated by ZeroRepo reveals a complex and coherent structure, encompassing layered dependencies and coordinated execution across modules and functions, demonstrating its ability to generate repositories that closely resemble real-world software complexity.

Graph-Driven Repository Generation Outperforms Existing Methods

Scientists have presented a novel approach to automated repository generation, introducing the Repository Planning Graph (RPG) as a structured representation that unifies both high-level planning and detailed implementation. By encoding capabilities, file structures, data flows, and functions within a single graph, RPG overcomes limitations inherent in natural language approaches. The team developed ZeroRepo, a graph-driven framework built upon RPG, to generate complete repositories from scratch, demonstrating significant advancements in scalability and accuracy. Evaluations using the RepoCraft benchmark, comprising six real-world projects, reveal that ZeroRepo substantially outperforms existing methods, producing repositories with nearly 36,000 lines of code on average. Analysis indicates that RPG effectively models complex dependencies, allowing for progressively more sophisticated planning as functionality and code size increase, and enhances an agent’s understanding of the repository structure, accelerating the development process. The authors acknowledge that further work is needed to address the challenges of generating even more complex and nuanced software systems, including exploring methods to automate the creation of the initial RPG and expanding the framework’s ability to handle diverse project requirements.

👉 More information
🗞 RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation
🧠 ArXiv: https://arxiv.org/abs/2509.16198

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Lensed Gravitational Waves Detected with 98% Accuracy Using Novel Network

Lensed Gravitational Waves Detected with 98% Accuracy Using Novel Network

January 2, 2026
Decoherence Enables Information Critical Phases and Fractional Logical Qubit Recovery

Decoherence Enables Information Critical Phases and Fractional Logical Qubit Recovery

January 2, 2026
Spin Hydrodynamics Enables Consistent Theory for Relativistic Fluids with Rank-3 Tensor Angular Momentum

Spin Hydrodynamics Enables Consistent Theory for Relativistic Fluids with Rank-3 Tensor Angular Momentum

January 2, 2026