Patent classification, crucial for understanding technological advancements, presents significant hurdles due to its complex, hierarchical and imbalanced nature. Lorenzo Emer, Marco Lippi, and Andrea Mina, from the Scuola Superiore Sant’Anna and Universities of Pisa, Florence and Cambridge, alongside Andrea Vandin et al., investigate whether large language models (LLMs) can improve upon existing encoder-based methods, particularly for classifying less common technologies. Their systematic comparison using a benchmark dataset of US patents reveals that while encoder models excel at identifying frequent technological areas, LLMs demonstrate superior performance on rare or emerging fields, offering valuable insight into early-stage and cross-disciplinary innovation. This research highlights the complementary strengths of both approaches and informs the development of more efficient and comprehensive patent analysis tools, balancing accuracy with computational cost and environmental impact.
The research team evaluated BERT, SciBERT, and PatentSBERTa alongside open-weight LLMs on the USPTO-70k benchmark dataset, a highly imbalanced collection of 70,000 patents.
This work focused on assessing the ability of each approach to accurately assign Cooperative Patent Classification (CPC) codes, a crucial task for analysing technological change and tracking innovation. Experiments show that encoder-based models excel in classifying frequent CPC subclasses, achieving high aggregate performance due to their proficiency with well-represented technologies.
However, these models struggle with rare subclasses, often associated with nascent or cross-disciplinary fields. In contrast, LLMs demonstrate comparatively higher performance on these infrequent subclasses, particularly at higher hierarchical levels, suggesting an ability to better capture weakly institutionalised technologies.
The study unveils that this difference stems from the LLMs’ capacity to generalise from limited examples, offering improved coverage of the “long tail” of technological categories. Researchers evaluated LLMs using zero-shot, few-shot, and retrieval-augmented prompting techniques, and further refined the best-performing model with parameter-efficient fine-tuning.
This comprehensive assessment quantified not only classification accuracy but also inference time and energy consumption, revealing that encoder-based models are up to three orders of magnitude more efficient than LLMs. These findings inform responsible patentometrics and technology mapping, motivating the development of hybrid classification approaches that leverage the efficiency of encoders with the long-tail coverage of LLMs, all while considering computational and environmental constraints. The work opens avenues for improved large-scale patent analytics and a more nuanced understanding of technological landscapes.
Comparative performance of encoder and large language models for imbalanced patent classification remains an open question
Researchers systematically compared encoder-based classifiers with open-weight large language models for patent classification using the USPTO 70k benchmark dataset. The study employed BERT, SciBERT, and PatentSBERTa as encoder baselines, evaluating LLMs under zero-shot, few-shot, and prompting conditions.
Parameter-efficient fine-tuning, utilising Low-Rank Adaptation (LoRA), was then applied to the highest performing LLM to further optimise its performance on the task. Experiments focused on a highly imbalanced dataset, reflecting the challenges of real-world patent classification where some technological categories are far more prevalent than others.
The team assessed performance across the entire CPC hierarchy, paying particular attention to the behaviour of each model on both frequent and infrequent subclasses. This granular analysis aimed to reveal whether LLMs could offer advantages in classifying emerging or niche technologies often underrepresented in training data.
To quantify computational demands, researchers measured inference time and energy consumption for each model. This involved detailed profiling of model execution on standard hardware, providing a direct comparison of efficiency between encoder-based and LLM-based approaches. The study pioneered a rigorous evaluation of retrieval-augmented generation (RAG) and hybrid RAG-plus-few-shot configurations, assessing how contextualisation impacts classification accuracy.
By combining external document retrieval with prompting strategies, the team explored methods to ground LLM predictions in relevant domain-specific text. This methodological approach enabled the identification of complementary strengths between encoder models and LLMs, demonstrating that encoder models excel with frequent subclasses while LLMs show promise for rare ones. The findings inform responsible patentometrics and motivate hybrid classification approaches that leverage the efficiency of encoders with the long-tail coverage of LLMs, addressing both computational and environmental concerns.
Encoder models outperform on common technologies, LLMs identify emerging trends and predict future ones
Scientists conducted a systematic comparison of encoder-based classifiers and open-weight large language models (LLMs) for patent classification using the USPTO-70k benchmark dataset. Experiments revealed that encoder models, specifically BERT, SciBERT, and PatentSBERTa, achieved higher aggregate performance, demonstrating strength in classifying frequent CPC subclasses.
However, these encoder models struggled with rare subclasses, indicating a limitation in identifying emerging technologies. In contrast, LLMs demonstrated relatively higher performance on infrequent CPC subclasses, often linked to early-stage, cross-domain, or weakly institutionalised technologies, particularly at higher hierarchical levels.
Data shows that LLMs excel in areas where encoder models falter, suggesting complementary roles for each approach in patent classification tasks. The team measured performance across the entire hierarchy, revealing nuanced strengths for each model type depending on the prevalence of the CPC subclass. Further analysis quantified inference time and energy consumption, showing encoder-based models are up to three orders of magnitude more efficient than LLMs.
Tests prove that while LLMs offer improved coverage of rare technological areas, they come at a significant computational cost. Results demonstrate that LLMs require substantially more resources for equivalent classification tasks. Researchers also assessed parameter-efficient fine-tuning of the best-performing LLM, aiming to balance performance and efficiency.
The study informs responsible patentometrics and technology mapping, motivating hybrid classification approaches that combine encoder efficiency with the long-tail coverage of LLMs. This work highlights the potential for combining the strengths of both methodologies under computational and environmental constraints, paving the way for more robust and sustainable patent analysis.
CPC classification performance diverges between frequent and rare technological subclasses, indicating differing levels of model accuracy
Scientists systematically compared encoder-based classifiers and large language models for classifying patents into Cooperative Patent Classification (CPC) codes. The research utilized a highly imbalanced benchmark dataset of 70,000 United States Patent and Trademark Office (USPTO) patents to evaluate performance across the CPC hierarchy.
Results indicate that encoder-based models excel in classifying frequent CPC subclasses, achieving higher overall aggregate performance, but struggle with rare categories. Conversely, large language models demonstrated relatively higher performance on infrequent subclasses, particularly those associated with emerging, cross-disciplinary, or less established technologies.
This suggests a complementary relationship between the two approaches, with encoder models offering efficiency and LLMs providing broader coverage of the long tail of technological classifications. The authors also quantified inference time and energy consumption, finding encoder-based models to be significantly more efficient, up to three orders of magnitude, than large language models.
This study highlights a trade-off between computational efficiency and semantic coverage in patent classification. Encoder-based models remain effective for large-scale, routine classification, while LLMs can improve the visibility of rare technologies, potentially mitigating biases in scientometric analysis.
The authors acknowledge limitations related to data licensing restrictions and suggest future research could explore adaptive retrieval methods and human-in-the-loop evaluation to enhance practical application. Ultimately, the findings advocate for purpose-driven integration of LLMs, balancing their benefits with scalability and sustainability concerns in large-scale innovation studies.
👉 More information
🗞 Large Language Models for Patent Classification: Strengths, Trade-offs, and the Long Tail Effect
🧠 ArXiv: https://arxiv.org/abs/2601.23200
