Aspect-Category Sentiment Analysis (ACSA) unlocks detailed understanding of customer opinions by pinpointing specific topics within reviews and gauging associated sentiment, but building effective ACSA systems traditionally requires large amounts of labelled data which is expensive and time-consuming to create. Filippos Ventirozos from Manchester Metropolitan University, Peter Appleby from Autotrader Research Group, and Matthew Shardlow from Manchester Metropolitan University address this challenge by demonstrating how large language models can perform ACSA without any prior training on labelled examples, a technique known as zero-shot learning. The team introduces a new method using ‘Chain-of-Thought’ prompting, which guides the language model’s reasoning process via an intermediate ‘Unified Meaning Representation’, effectively structuring how it analyses text. Their evaluation across multiple language models and datasets reveals that this approach shows promise, particularly with mid-sized models, and opens avenues for applying ACSA to new areas where labelled data is scarce, although further investigation is needed to optimise performance across different model sizes.
The high cost of annotated data remains a major obstacle to applying large language models (LLMs) in new or low-resource domains. To address this challenge, researchers propose a novel Chain-of-Thought (CoT) prompting approach that introduces an intermediate Unified Meaning Representation (UMR) to structure model reasoning and improve performance when labeled data are scarce. The method is designed to enhance aspect category sentiment analysis (ACSA), a task where conventional techniques often struggle to capture multiple sentiments directed at different aspects within a single review.
Mechanics of the Unified Meaning Representation Approach
The UMR-based Chain-of-Thought Mechanism
The proposed UMR-based CoT framework guides the LLM through a two-stage reasoning process. First, the model generates a structured UMR that explicitly identifies entities, their attributes, and associated sentiments expressed in the text. In the second stage, this structured representation is mapped to predefined aspect categories and sentiment polarities. Inspired by semantic parsing, the UMR serves as an interpretable summary of the review’s semantic content, enabling more accurate and nuanced sentiment extraction.
The researchers evaluated this approach against a baseline
Comparative Evaluation Across Diverse Language Models and Datasets
The researchers evaluated this approach against a baseline CoT strategy that directly prompts the model to output aspect–polarity pairs without an intermediate representation. Experiments were conducted across three LLMs—Qwen3-4B, Qwen3-8B, and Gemini-2.5 Pro—and four diverse ACSA datasets. Carefully designed prompts ensured consistent instructions across both reasoning strategies, allowing for a fair comparison of their effectiveness.
For example, the UMR for the sentence “The pizza was delicious but the service was terrible” would identify “pizza” and “service” as entities, “delicious” and “terrible” as opinions, and link them to their respective aspects. By forcing the model to first build this intermediate UMR, researchers hypothesized that the structured reasoning process would improve the accuracy of identifying and associating categories with their sentiments. The team proposed a new method employing a ‘chain-of-thought’ prompting technique, structured around an intermediate Unified Meaning Representation, to guide the reasoning process of the language models. Experiments conducted with several models, Qwen3-4B, Qwen3-8B, and Gemini-2.5-Pro, across four diverse datasets revealed that the effectiveness of this approach varies depending on the specific model and dataset combination. Notably, the Qwen3-8B model demonstrated performance comparable to standard chain-of-thought methods, suggesting that structured reasoning techniques can be beneficial under certain conditions.
Caveats and Future Scope of Research Findings
Understanding Limitations and Future Research Scope
However, the research indicates that the Unified Meaning Representation does not universally improve performance, and further investigation is needed to understand which model architectures and dataset characteristics benefit most from this approach. The team acknowledges limitations stemming from the relatively small size of the available Unified Meaning Representation dataset, which restricts the ability to draw definitive conclusions. Future work will focus on identifying the specific properties of language models that make them more receptive to structured prompting, and on integrating the Unified Meaning Representation directly into the model’s training process. Additionally, extending this framework to other detailed natural language processing tasks and incorporating domain-specific examples of the Unified Meaning Representation will help establish broader principles for when structured reasoning proves advantageous. The researchers also highlight the importance of responsible deployment of these technologies, given the potential for bias and misuse inherent in large language models trained on extensive internet data.
👉 More information🗞Exploring Zero-Shot ACSA with Unified Meaning
🗞 Exploring Zero-Shot ACSA with Unified Meaning Representation in Chain-of-Thought Prompting
🧠 ArXiv: https://arxiv.org/abs/2512.19651
The structure provided by the Unified Meaning Representation moves beyond simple tokenization by forcing the language model to materialize abstract linguistic relationships—such as agent, action, and target—into a canonical, machine-readable format. This move towards structured synthesis mimics the intermediate steps taken by human semantic parsers, making the reasoning path explicit and significantly more robust than relying solely on the raw flow of tokens, which can often dilute the focus on critical aspect-sentiment pairings.
From a computational perspective, introducing the UMR adds an intermediate decoding step, which inherently increases the sequence length processed by the LLM compared to direct prompting. This length extension presents a scalability trade-off: while it improves accuracy and interpretability, the model must manage both the generation of the rich UMR structure and the subsequent mapping of that structure, potentially increasing inference latency, particularly for smaller, constrained models.
Furthermore, the success of zero-shot ACSA heavily relies on the generalizability of the prompts. If the domain data deviates significantly from the natural language patterns seen during the prompt engineering phase—for instance, shifting from product reviews to medical case notes—the inherent structure encoded in the UMR may become brittle. This requires robust prompt templates that anticipate linguistic variation while maintaining the necessary formal constraints for accurate semantic parsing.
To solidify real-world deployment, future work must address data efficiency not only in labeling but also in prompt creation. Techniques like automated prompt refinement or using smaller, highly specialized foundational models (like those optimized for sequence-to-sequence tasks) could streamline the pipeline, making the high level of granularity achieved by UMR-CoT accessible to users without massive computational resources.
