Researchers are tackling the critical problem of maintaining the general capabilities of large language models (LLMs) during targeted editing, as current methods often inadvertently degrade performance beyond the specific changes made. Zarif Ikram, Arad Firouzkouhi, and Mahdi Soltanolkotabi from the University of Southern California, working with Stephen Tu and Paria Rashidinejad, present CrispEdit, a new second-order editing algorithm designed to explicitly preserve capabilities while modifying LLM behaviour. This work, conducted entirely within the University of Southern California, is significant because it formulates editing as a constrained optimisation problem, projecting updates onto a low-curvature subspace to avoid the ‘proxy gaming’ and capability corruption seen in previous approaches, and achieves substantial improvements in edit success with minimal capability degradation across standard benchmarks.

Large language models are powerful tools, but modifying their behaviour without unintended consequences remains a major hurdle. A fresh technique promises more precise control, allowing developers to refine these systems without damaging their broader abilities. This advance could unlock safer and more reliable artificial intelligence applications. Scientists have developed a new method for editing large language models (LLMs) that maintains their overall capabilities while making targeted changes.

This work addresses a significant problem in the field: existing editing techniques often inadvertently degrade a model’s general knowledge and reasoning skills, a phenomenon akin to proxy or reward hacking. The research introduces CrispEdit, a scalable algorithm that explicitly prioritises capability preservation as a core constraint during the editing process.

Unlike previous approaches that rely on restrictive assumptions, CrispEdit formulates editing as a constrained optimisation problem, ensuring changes align with the model’s existing strengths. At the heart of CrispEdit lies a technique for projecting edit updates onto a specific subspace of the model’s “capability-loss landscape”, effectively steering changes away from areas that could cause degradation.

This projection leverages Bregman divergence, a mathematical tool that allows for accurate measurement of capability preservation even when the underlying model hasn’t been fully trained. Furthermore, the team devised a way to make this computationally intensive process efficient at the scale of modern LLMs, utilising Kronecker-factored approximate curvature and a novel matrix-free projector to avoid the need for massive data storage.

Initial benchmarks demonstrate CrispEdit achieves high edit success rates while limiting capability degradation to below 1% across several datasets, a substantial improvement over existing editors. The study details how CrispEdit differs from earlier methods, which often impose constraints in parameter or representation space without directly addressing capability preservation.

These prior techniques frequently struggle when tested on realistic, open-ended tasks. Instead, CrispEdit adopts a first-principles approach, framing editing as a problem of minimising edit loss while simultaneously maintaining broader capabilities. This is achieved by carefully analysing the curvature of the model’s loss landscape, identifying directions where changes are less likely to disrupt existing knowledge.

The researchers also needed to develop a practical way to apply this knowledge at scale, utilising Kronecker-factored approximate curvature (K-FAC) to efficiently estimate the curvature without excessive computational resources. Moreover, the novel matrix-free projector avoids the need to construct and store large matrices, further enhancing scalability.

Across standard model-editing benchmarks, including MMLU, GSM8K, and IFEval, CrispEdit consistently outperformed previous editors, demonstrating its potential to unlock more reliable and versatile LLM editing. This advancement could prove vital as LLMs become increasingly integrated into critical applications, demanding both adaptability and unwavering performance.

Capability preservation via low-curvature updates during model editing

Across standard editing benchmarks, CrispEdit maintained capability degradation below 1% on average, a substantial improvement over previous editors. This performance was achieved while successfully completing edits, demonstrating a clear trade-off between modifying model behaviour and preserving existing skills. Controlled experiments using the MNIST to FashionMNIST image classification task revealed that projecting updates onto the low-curvature subspace of the capability-loss landscape yielded the strongest preservation of original abilities.

K-FAC, a technique for approximating curvature, closely mirrored this behaviour at a reduced computational cost. Scaling the work to large language models, LLaMA-3-8B-Instruct and Qwen-2.5-1.5B-Instruct, showed that edits remained reliable in standalone text generation. Edited models generalised effectively across semantically similar queries, indicating the changes were not merely memorization of the edit pairs.

Importantly, out-of-scope knowledge and core skills like reasoning, instruction-following, and truthfulness were largely unaffected. Batch editing, applying numerous edits simultaneously, and sequential editing, applying edits in stages, both benefited from CrispEdit’s approach. The research introduces a matrix-free projector that exploits the Kronecker eigen-structure of the curvature, avoiding the construction of massive projection matrices.

This innovation makes constraint-aware second-order editing feasible and allows for precomputing capability curvature statistics for reuse, reducing computational demands. Using cached curvature, applying 3000 edits required only six minutes on an NVIDIA A40 GPU. The work formulates model editing as constrained optimisation, minimising edit loss while ensuring the change in capability loss remains within a tolerance of ε. This approach unifies and extends many existing model editing frameworks, offering a more general and principled solution.

Constrained optimisation via Bregman divergence and Kronecker-factored curvature

CrispEdit, a second-order editing algorithm, began with the formulation of editing as constrained optimisation, directly addressing the challenge of preserving a large language model’s capabilities during targeted modifications. This approach moves beyond simple parameter updates by explicitly considering the impact on broader performance. Central to this work is the use of Bregman divergence, a mathematical tool for measuring the difference between two states, to express the capability constraint.

Specifically, the quadratic form of this divergence accurately yields the Gauss-Newton Hessian, even when the underlying language model has not been fully trained. To manage the computational demands of this second-order procedure at the scale of modern LLMs, researchers implemented Kronecker-factored approximate curvature, or K-FAC. K-FAC is a technique for efficiently approximating the curvature of the loss landscape, allowing for faster and more scalable optimisation.

But rather than constructing and storing massive projection matrices, a novel matrix-free projector was developed. This projector cleverly exploits the Kronecker structure inherent in the curvature approximation, significantly reducing memory requirements and computational cost. Following this setup, edit updates are then projected onto the low-curvature subspace of the capability-loss landscape.

This projection ensures that changes primarily occur in directions that minimise disruption to the model’s existing abilities. Unlike previous methods that restrict updates to specific parameters or representations, CrispEdit’s approach is more flexible and directly tied to maintaining overall performance. Once projected, the updates are applied to the model, refining its behaviour without causing unintended consequences.

Evaluating the effectiveness of any editing method requires careful consideration of both edit success and capability preservation. Therefore, the work employed standard model-editing benchmarks to assess CrispEdit’s performance, measuring its ability to both accurately implement desired changes and avoid degrading the model’s general knowledge and reasoning skills. These benchmarks included assessments of reliability and generality with and without question answering context, alongside evaluations on MMLU, GSM8K, IFEval, ARC, and TruthQA datasets.

Preserving language model abilities during targeted modification through constrained optimisation

Scientists have long struggled with the delicate balance between modifying large language models (LLMs) and preserving their inherent capabilities. For years, editing these models to correct biases or remove harmful content often came at the cost of general knowledge and reasoning skills, a frustrating trade-off for developers. Previous methods, while showing promise, frequently introduced unintended consequences, essentially breaking what they were trying to fix.

This new work, CrispEdit, represents a step forward by explicitly addressing capability preservation as a core constraint during the editing process, rather than an afterthought. The significance of CrispEdit extends beyond simply achieving higher scores on benchmark tests. It introduces a principled, second-order optimisation approach that carefully navigates the complex “capability-loss landscape” of LLMs.

By projecting edit updates onto areas of stability, the algorithm minimizes the risk of damaging the model’s broader intelligence. Once a system can reliably refine behaviour without eroding core skills, the possibilities expand dramatically, from tailoring models for specific applications to creating more trustworthy and aligned AI assistants. The reliance on approximations to manage computational demands at scale introduces potential limitations, and the extent to which these results generalise to even larger models remains an open question.

However, the consistent performance across varied datasets and energy thresholds is encouraging. Unlike many current editing techniques that are sensitive to hyperparameter settings, CrispEdit demonstrates a degree of robustness that is valuable for practical deployment. For the field, this signals a move towards more controlled and predictable LLM editing.

Beyond this specific implementation, the underlying principle of constrained optimisation could inspire new approaches to model adaptation and personalization. Researchers will need to investigate how to further refine the capability constraint itself, and explore methods for automatically detecting and mitigating potential side effects before they manifest.

👉 More information
🗞 CrispEdit: Low-Curvature Projections for Scalable Non-Destructive LLM Editing
🧠 ArXiv: https://arxiv.org/abs/2602.15823

Tags:

Bregman divergence capability preservation constrained optimisation K-FAC Kronecker-factored approximate curvature Large Language Models low-curvature subspace projection methods. second-order editing

AI Editing Keeps Skills Intact with New Method

Capability preservation via low-curvature updates during model editing

Constrained optimisation via Bregman divergence and Kronecker-factored curvature

Preserving language model abilities during targeted modification through constrained optimisation

Rohail T.

Latest Posts by Rohail T.:

Quantum Error Correction Gains a Clearer Building Mechanism for Robust Codes

Protected: Models Achieve Reliable Accuracy and Exploit Atomic Interactions Efficiently

Protected: Quantum Computing Tackles Fluid Dynamics with a New, Flexible Algorithm