The pursuit of accurate and nuanced grammatical error correction in natural language processing continues to drive innovation in large language models. While decoder-only models demonstrate proficiency in generating fluent text, optimising them for ‘minimal-edit’ correction – altering text with the fewest possible changes while maintaining meaning – presents a distinct challenge. Researchers at Adam Mickiewicz University, namely Ryszard Staruch, Filip Graliński, and Daniel Dzienisiewicz, address this through a novel training methodology and a critical re-evaluation of standard datasets. Their work, detailed in “Adapting LLMs for Minimal-edit Grammatical Error Correction”, introduces an error rate adaptation technique and highlights inconsistencies within commonly used English grammatical error correction datasets, subsequently analysing the impact of training models on corrected versions of these datasets. The team’s experiments establish a new benchmark for single-model performance on the BEA test set and, to promote transparency and further research, they have publicly released the source code used in their experiments.
Recent advances demonstrate the successful application of decoder-only large language models to minimal-edit English Grammatical Error Correction (GEC), achieving a new state-of-the-art result on the BEA test set with a single model. This research addresses a noted gap in the field by investigating error rate adaptation and proposing a novel training schedule designed to enhance performance in minimal-edit scenarios. Minimal-edit GEC prioritises correcting errors with the fewest possible alterations to the original text, a nuance often overlooked in standard GEC evaluations. Experiments establish the effectiveness of this approach, showcasing improvements in GEC accuracy and highlighting the importance of careful consideration of training data composition.
The study emphasises the critical role of data quality, specifically addressing inconsistencies within commonly used GEC datasets. Researchers detokenised, or standardised the text formatting of, standard datasets—including FCE (First Certificate in English), BEA, CoNLL-2014, and JFLEG—to ensure consistency. This process identified and corrected errors present within the original datasets themselves, revealing discrepancies in spacing and punctuation. The JFLEG dataset exhibits the highest proportion of erroneous sentences at 95.36% in the development set, while the FCE-Train dataset contains 65.43% erroneous sentences, demonstrating the need for thorough data preprocessing. Analysis assesses the impact of training models on these corrected, detokenised datasets.
Models were trained utilising between two and four NVIDIA A100 GPUs, with training times ranging from two to three hours per model. The training regime employs an AdamW8bit optimiser, a variant of the stochastic gradient descent algorithm used to adjust model parameters, with a learning rate of 5e-6, a batch size of four, and a linear learning rate scheduler. Gradient accumulation steps of four and a warmup period of 100 steps per dataset further refine the training process, with a single epoch used per dataset and a weight decay of 0.01. The models are prompted during both training and inference with a specific instruction to correct text with minimal changes, reinforcing the focus on minimal-edit GEC.
Results demonstrate that training on detokenised datasets positively impacts performance and establishes a new benchmark on the BEA test set, indicating the effectiveness of the proposed training schedule and the importance of data quality in minimal-edit GEC. This confirms that meticulous data curation yields substantial improvements in GEC systems, and the released source code facilitates further investigation and encourages community contribution to advance minimal-edit GEC.
👉 More information
🗞 Adapting LLMs for Minimal-edit Grammatical Error Correction
🧠 DOI: https://doi.org/10.48550/arXiv.2506.13148
