New Method Evaluates Legal Text Style, Offering an Alternative to 2 Expert Reviews

Researchers are developing new methods to assess the stylistic quality of legal text generated by artificial intelligence. Yiran Rex Ma from the School of Foreign Languages and Center for Digital Humanities at Peking University, alongside Yuxiao Ye and Huiyuan Xie from the Department of Computer Science and Technology at Tsinghua University, present a novel hybrid evaluation method called CLASE (Chinese LegAlese Stylistic Evaluation). This work addresses a significant gap in current evaluation techniques, which often prioritise factual accuracy over adherence to the nuanced stylistic conventions of legal writing. By combining linguistic feature analysis with an experience-guided large language model approach, learned from authentic legal documents, CLASE offers a transparent and reference-free means of gauging stylistic fidelity. Experiments demonstrate that CLASE more closely aligns with human judgement than existing automatic metrics, providing a scalable and interpretable solution for improving the quality of AI-generated legal content.

While these models increasingly demonstrate factual accuracy, they often fail to replicate the nuanced and specialised writing style required in legal documentation.

Establishing a reliable evaluation method is crucial for improving the stylistic performance of automatically generated legal content. Current methods prove inadequate, as reference-based metrics confuse semantic accuracy with stylistic fidelity and LLM-based evaluations lack transparency and consistency. This work introduces a hybrid approach that combines linguistic feature analysis with insights from large language models, offering a more robust and interpretable solution.

CLASE learns from comparisons between authentic legal documents and versions refined by language models, capturing both surface-level features and implicit stylistic norms. The system avoids reliance on manual annotation by legal experts, a process that is both impractical and difficult given the tacit nature of stylistic expertise. This innovative design allows for scalable and objective assessment of legal writing style.

Experiments conducted on 200 Chinese legal documents reveal that CLASE achieves substantially higher alignment with human judgments than traditional metrics and other LLM-based evaluation methods. Beyond simply scoring text, CLASE provides detailed breakdowns and suggestions for improvement, offering a practical tool for professional stylistic evaluation.

The research demonstrates a significant advancement in the field of legal text generation, paving the way for more sophisticated and credible automated legal writing systems. The development of CLASE represents a key step towards ensuring that artificial intelligence can not only understand and reason about the law, but also communicate legal information with the precision and authority expected by professionals and the public alike.

Code and data supporting this research are openly available, facilitating further investigation and development within the legal technology community. CLASE attains a level of alignment exceeding that of traditional metrics and other large language model-based evaluation methods, demonstrating an impressive advancement in evaluating legal writing style.

The study employed a hybrid scoring mechanism, combining linguistic feature analysis with experience-guided LLM assessments to achieve this result. This approach allows for both quantifiable measurements and the capture of implicit stylistic norms, a critical aspect of legal writing. By learning from contrastive pairs of authentic legal documents and their LLM-restored counterparts, CLASE effectively discerns subtle stylistic nuances often missed by conventional evaluation techniques.

Furthermore, CLASE provides interpretable score breakdowns and actionable suggestions for improvement, allowing for targeted refinement of generated legal text, addressing issues such as imprecise diction, inappropriate lexical choices, and violations of linguistic conventions. The research highlights that stylistic quality is as crucial as semantic accuracy in legal competence.

Traditional evaluation methods often prioritise factual correctness, neglecting the intricate stylistic dimensions that legal professionals recognise as “professional” or “sophisticated”. CLASE directly addresses this gap, offering a scalable and practical solution for evaluating stylistic fidelity in legal text generation. A contrastive learning approach underpinned the development of CLASE, a novel method for evaluating Chinese legalese stylistic quality.

The research assembled a dataset of authentic legal documents and their corresponding revisions generated by large language models, with the LLM-restored counterparts serving as negative examples, highlighting stylistic deficiencies commonly found in machine-generated text. This pairing strategy enabled the system to discern subtle differences between professional legal writing and its automated approximations.

Subsequently, a hybrid scoring mechanism was implemented, combining linguistic feature-based scores with experience-guided LLM-as-a-judge scores. Linguistic features, encompassing elements such as sentence length, lexical density, and the frequency of legal-specific terminology, were extracted from each document using natural language processing techniques.

These quantifiable metrics provided a baseline assessment of stylistic characteristics. To capture more nuanced stylistic norms, a separate LLM-based evaluation component was incorporated, leveraging the capacity of large language models to assess text quality. Crucially, both the weighting of linguistic features and the LLM’s scoring criteria were learned directly from the contrastive dataset.

A machine learning algorithm was trained to identify the feature coefficients that best distinguished authentic legal text from its LLM-generated counterpart. Simultaneously, the LLM’s scoring experiences, its internal understanding of good legal style, were refined through exposure to the same paired data. This iterative learning process ensured that CLASE’s evaluation criteria aligned closely with established legal writing conventions.

The design deliberately avoids reliance on reference texts, addressing a key limitation of existing automatic evaluation methods that can conflate semantic accuracy with stylistic fidelity. Scientists have developed a new method for evaluating the stylistic quality of legal text generated by artificial intelligence. For years, assessing the nuance of writing has proved stubbornly difficult to automate, relying instead on laborious human review, particularly in highly specialised fields like law, where subtle linguistic choices can significantly alter meaning and impact legal outcomes.

The newly presented CLASE system offers a potential bridge between automated evaluation and the complex demands of legal drafting. What makes this work notable is not simply improved accuracy, measured as a substantially higher alignment with human judgements, but the approach taken to achieve it. By combining quantifiable linguistic features with insights gleaned from large language models, CLASE moves beyond simplistic comparisons of text and attempts to capture the implicit stylistic norms that define effective legal writing.

This hybrid design is a clever response to the limitations of both purely rule-based systems and the often opaque reasoning of LLMs used as sole evaluators. However, the current iteration focuses on Chinese legal documents, limiting its immediate applicability. Further research will need to demonstrate its effectiveness across different legal systems and languages. Moreover, while CLASE offers interpretable score breakdowns, the extent to which these insights can genuinely guide improvements in LLM-generated text remains to be seen.

👉 More information
🗞 CLASE: A Hybrid Method for Chinese Legalese Stylistic Evaluation
🧠 ArXiv: https://arxiv.org/abs/2602.12639

Rohail T.

Rohail T.

As a quantum scientist exploring the frontiers of physics and technology. My work focuses on uncovering how quantum mechanics, computing, and emerging technologies are transforming our understanding of reality. I share research-driven insights that make complex ideas in quantum science clear, engaging, and relevant to the modern world.

Latest Posts by Rohail T.:

Neutron Stars May Avoid Exotic Matter Thanks to Delayed Hyperon Appearance

Neutron Stars May Avoid Exotic Matter Thanks to Delayed Hyperon Appearance

February 17, 2026
Secure Quantum Encryption Protects Data during Remote Neural Network Training and Use

Secure Quantum Encryption Protects Data during Remote Neural Network Training and Use

February 17, 2026
Formula Identifies Domain Wall Structures and Reveals a Four-Group Symmetry in Axion Models

Formula Identifies Domain Wall Structures and Reveals a Four-Group Symmetry in Axion Models

February 17, 2026