The increasing demand for code modernisation and cross-platform compatibility drives significant research into automated code translation, yet ensuring the functional correctness of these translations remains a major challenge. Ali Reza Ibrahimzada, Brandon Paulsen, and Reyhaneh Jabbarvand, alongside Joey Dodds and Daniel Kroening, present MatchFixAgent, a novel framework designed to validate and repair code translations across a wide range of programming languages. This research addresses the limitations of existing methods, which often struggle with generalisation and rely on potentially flawed test suites, by employing a multi-agent system powered by large language models. The team demonstrates that MatchFixAgent achieves remarkably high coverage in verifying translation pairs, accurately identifying discrepancies with prior work in a significant number of cases, and importantly, successfully repairing a substantially greater proportion of faulty translations, marking a considerable advance in the reliability and adaptability of automated code translation tools.

Autonomous Repository-Level Code Translation Validation and Repair

MatchFixAgent is a new system that automatically verifies and corrects code translations between programming languages. This research introduces a method that works independently of the languages involved, ensuring the accuracy and reliability of translated code. The system generates a comprehensive set of tests from the history of the original code, then runs these tests on both the original and translated versions. Discrepancies in the test results highlight potential translation errors, which MatchFixAgent then attempts to diagnose and repair using techniques that pinpoint the source of failures and create targeted corrections, preserving the original intent of the code. Experiments on open-source projects demonstrate that MatchFixAgent accurately identifies and repairs translation errors, significantly reducing the manual effort needed to validate and maintain translated codebases.

Language Models and Software Toolkits Evaluated

The research evaluated several programming languages, large language models, and software toolkits. These included GPT-4o, Claude, Gemini Pro, and OpenAI Codex, alongside benchmarking platforms like BigCodeBench and AI development tools such as OpenHands and Moatless Tools. Supporting software and libraries included tools for mathematical operations, checkdigit calculations, color conversion, and heap queue algorithms, as well as Python’s HTML parsing library. The work also draws upon research from organizations involved in software maintenance, testing, and engineering, including ICSME, ISSTA, FSE, and PLDI, with publications appearing in journals like Science China Information Sciences. Key observations reveal a strong focus on large language models and their application to software engineering tasks, with an emphasis on research, benchmarking, and evaluating LLM capabilities in code-related areas. The projects cover various stages of the software engineering lifecycle, including code translation, improvement, testing, and maintenance, indicating a collaborative effort in this field.

MatchFixAgent Validates and Repairs Code Translations

This work presents MatchFixAgent, a new technique that combines program analysis with large language model agents to automatically validate and repair code translations across multiple programming languages. The system systematically generates targeted tests to demonstrate functional equivalence or identify semantic bugs in translated code, generating reports that aid understanding of the process. The team achieved high accuracy, with the system producing equivalence verdicts for the vast majority of translation pairs and correcting discrepancies with existing techniques in a significant proportion of cases. Notably, MatchFixAgent successfully repaired a substantially higher percentage of inequivalent translations compared to prior work, demonstrating improved adaptability and precision. The system is also cost-effective and scalable, requiring minimal code to support additional programming languages and validating instances relatively quickly. To the best of the authors’ knowledge, MatchFixAgent represents the first approach capable of effectively validating and repairing translations at the repository level across multiple programming languages.

👉 More information
🗞 MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair
🧠 ArXiv: https://arxiv.org/abs/2509.16187

Tags:

code translation equivalence validation Large Language Models Multi-Agent Architecture programming languages Repair Semantic Analysis Test generation

Matchfixagent Achieves 72.8% Functional Equivalence in Repository-level Code Translation Validation and Repair, Exceeding 60.7% Accuracy

Autonomous Repository-Level Code Translation Validation and Repair

Language Models and Software Toolkits Evaluated

MatchFixAgent Validates and Repairs Code Translations

Rohail T.

Latest Posts by Rohail T.:

Detects 33.8% More Mislabeled Data with Adaptive Label Error Detection for Better Machine Learning

Decimeter-level 3D Localization Advances Roadside Asset Inventory with SVII-3D Technology

Spin-orbit Coupling Advances Quantum Hydrodynamics, Unveiling New Correlation Mechanisms and Currents