Research reveals that reinforcement learning from human feedback performs suboptimally when applied to colloquial subtitle translation due to divergence between the reward model and the language model. The RIVAL framework, employing adversarial training and incorporating quantitative metrics, effectively addresses this by aligning model performance with human evaluation.

The fidelity of machine translation systems, particularly when applied to nuanced, informal text such as video subtitles, remains a significant challenge. Current approaches leveraging reinforcement learning from human feedback (RLHF) often exhibit diminished performance in these contexts due to discrepancies between the reward model – trained on offline data – and the evolving large language model (LLM) it seeks to optimise. Researchers from Bilibili Inc., Fudan University, and Xi’an Jiaotong University, led by Tianjiao Li, Mengran Yu, and Qi Zhang et al., detail a novel adversarial training framework, RIVAL (Reinforcement Learning with Iterative and Adversarial Optimisation), designed to mitigate this issue. Their work, entitled “RIVAL: Reinforcement Learning with Iterative and Adversarial Optimisation for Machine Translation”, proposes a min-max game between the reward model and the LLM, iteratively refining both to achieve improved translation quality and alignment with human preferences, incorporating both qualitative and quantitative metrics.

Mitigating Reward Drift in Machine Translation via Adversarial Training

Recent progress in machine translation (MT) integrates reinforcement learning (RL) with large language models (LLMs), yielding improvements, particularly in tasks demanding nuanced understanding, such as colloquial subtitle translation. However, a critical vulnerability has emerged: divergence between the reward signal and the evolving LLM during training. This ‘reward drift’ compromises performance, as offline reward models (RMs), trained to assess translation quality, become misaligned with the LLM’s current translation strategy.

This misalignment stems from a distributional shift: the RM, trained on a static dataset, fails to accurately evaluate translations generated by an LLM undergoing continuous refinement. Consequently, the LLM receives inaccurate feedback, hindering its ability to optimise effectively.

Researchers are addressing this issue with adversarial training frameworks, notably RIVAL. This approach fundamentally reformulates the training process as a competitive game between the RM and the LLM. The RM learns to discriminate between high- and low-quality translations, based on human preferences, while the LLM attempts to produce translations that minimise this distinction. This iterative process aims to maintain alignment between the reward signal and the LLM’s translation strategy.

To further stabilise training and enhance generalisation, RIVAL incorporates quantitative preference rewards, such as BLEU scores, into the RM. BLEU (Bilingual Evaluation Understudy) is a metric that assesses the similarity between machine-generated and reference translations by counting matching n-grams. While BLEU provides an objective measure of accuracy, it complements the nuanced qualitative assessment of human preferences. This hybrid approach leverages the strengths of both evaluation methods, resulting in a more robust and reliable training process.

Experiments demonstrate that RIVAL significantly improves translation performance compared to baseline models. By actively mitigating reward drift through adversarial training and incorporating both qualitative and quantitative rewards, the framework produces more accurate and human-aligned translations, representing a notable advance in machine translation technology.

👉 More information
🗞 RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
🧠 DOI: https://doi.org/10.48550/arXiv.2506.05070

Tags:

Adversarial Training BLEU score. Distributional Shift Large Language Models Qualitative Preference Reward Quantitative Preference Reward Reinforcement Learning from Human Feedback Reward modelling RIVAL Subtitle Translation

Quantum News

LLMs and Subtitle Translation: Adversarial Training Improves Quality and Stability.

Mitigating Reward Drift in Machine Translation via Adversarial Training

Latest Posts by Quantum News:

University of Toronto Centre Awards Bell Prize for Neutral Atom Research

Tessara Therapeutics Leads Consortium to Develop Quantum Brain-on-Chip Platform

Thales Validates Post-Quantum Cryptography on Live Networks, Enabling Ongoing Protection