Researchers are tackling the challenge of equipping Large Language Models (LLMs) with the ability to solve complex coding problems. Shahd Seddik, Fahd Seddik, and Iman Saberi, alongside Fatemeh Fard, Minh Hieu Huynh and Patanamon Thongtanunam from the University of Melbourne, Australia, present a novel approach utilising Programming Knowledge Graphs (PKGs) to represent code and text with greater semantic understanding. Their work addresses the limitations of current Retrieval-Augmented Generation (RAG) techniques, which often struggle with identifying relevant context and can produce inaccurate or ‘hallucinated’ results. By structuring external data into finer-grained nodes and employing tree pruning alongside a re-ranking mechanism, this research achieves up to 20% gains in pass@1 accuracy on the HumanEval benchmark and a remarkable 34% improvement on MBPP , demonstrating a significant step forward in reliable and precise code generation.
These models often struggle with identifying relevant context and are prone to generating irrelevant or hallucinatory outputs, issues PKG aims to resolve through a structured knowledge representation. By constructing PKGs, the team achieves enhanced precision in retrieval and mitigates hallucinations via a re-ranking mechanism that cleverly integrates solutions beyond standard RAG techniques.
This hierarchical structure allows for retrieval at varying granularities, offering a trade-off between recall and precision; coarser units increase recall, while finer units enhance precision. Experiments with both function-level (Func-PKG) and block-level (Block-PKG) retrieval demonstrate the effectiveness of this granularity control, providing a nuanced approach to knowledge assembly. Furthermore, the research establishes that this approach not only improves performance on challenging problems but also maintains the accuracy of already-correct solutions, a crucial aspect often overlooked in RAG advancements. The replication package for this work is publicly available at https://github. The team constructed both code-centric and text-centric PKGs, representing knowledge as structured graphs to improve precision and reduce hallucinations. Experiments revealed that structuring external data into finer-grained nodes significantly improves granularity for more effective retrieval. The code-centric PKG was built by parsing source code into an Abstract Syntax Tree (AST)-derived hierarchy, mirroring syntactic containment within functions and blocks. Researchers instantiated two retrieval settings, Func-PKG retrieving at the function level and Block-PKG at the block level, to explore a granularity trade-off.
Tests prove that the proposed PKG, combined with a re-ranking mechanism, effectively addresses complex problems while minimally impacting solutions already correct without RAG. Data shows a significant reduction in Assertion errors on the MBPP dataset, although a slight increase in Name errors was observed. Scientists recorded that topic analysis on MBPP highlighted difficulties in solving string manipulation problems using RAG based on PKG. The breakthrough delivers a novel knowledge representation for RAG, enabling retrieval at multiple granularities and incorporating tree pruning for explicit control over context. Measurements confirm that the re-ranker prioritises solutions, even those generated without RAG, reducing the influence of potentially erroneous data and mitigating hallucinations.
PKG boosts coding LLM accuracy significantly
Furthermore, the study highlights that simply enhancing retrieval is insufficient; effective candidate selection through re-ranking is a primary driver of improved performance. The authors acknowledge that the applicability of their knowledge graph may be limited to specific contexts, recommending the development of new graphs tailored to individual projects when necessary. Future work could focus on learning re-rankers that utilise execution-aware signals, incorporating specification checks for mathematically-focused tasks, and creating adaptive retrieval policies that consider model capacity and topic characteristics.
👉 More information
🗞 Context-Augmented Code Generation Using Programming Knowledge Graphs
🧠 ArXiv: https://arxiv.org/abs/2601.20810
