Abstract Meaning Representation, a method for encoding sentence meaning as a graph of concepts and their relationships, is increasingly reliant on the power of large language models. Ho Shu Han from University College London, alongside Abdin et al., GemmaTeam et al., and others, present a comprehensive evaluation of four distinct decoder-only LLM architectures, Phi 3. 5, Gemma 2, LLaMA 3. 2, and DeepSeek R1 LLaMA Distilled, when applied to the task of AMR parsing. The team demonstrates that simple finetuning of these models achieves performance comparable to complex, state-of-the-art AMR parsers, with LLaMA 3. 2 notably matching the performance of established systems. Achieving a SMATCH F1 score of 0. 804, this research highlights the potential for streamlined, high-performing semantic parsing using readily available language models and suggests a promising new direction for natural language understanding.

LLMs Directly Generate Abstract Meaning Representations

Researchers have developed a new approach to semantic parsing, utilising large language models (LLMs) to directly translate natural language into structured meaning representations called Abstract Meaning Representations (AMRs). This differs from traditional methods, exploring the ability of LLMs to learn this mapping directly from data by ‘finetuning’ existing decoder-only LLMs. This methodology moves beyond established paradigms in AMR parsing, overcoming limitations found in methods that optimise entire graph structures, incrementally build graphs, or treat parsing as a translation task. The team’s approach leverages decoder-only LLMs, utilising the models’ ability to maintain rich contextual representations and generate sequences of information, naturally aligning with the incremental construction of AMR graphs.

They adapted four state-of-the-art LLMs, Phi-3. 5, Gemma-2, LLaMA-3. 2, and DeepSeek R1, using Parameter-Efficient LoRA finetuning, allowing adaptation without extensive computational resources. The models were trained on the AMR Annotation Release 3. 0 dataset, containing over 59,000 English sentences paired with their corresponding AMR graphs, and enhanced with techniques like GQA for efficiency and Chain-of-Thought reasoning. Evaluation focused on assessing both the semantic and structural accuracy of the generated AMR graphs, employing the SMATCH metric to measure similarity between model-generated and reference graphs. Performance was visualised to analyse accuracy at different levels of graph depth, providing insights into the models’ ability to capture complex relationships, and allowing systematic assessment of the approach’s effectiveness.

Language Models Excel at Semantic Parsing

Recent research demonstrates that sophisticated semantic analysis can now be achieved with surprising efficiency using large language models. Researchers successfully applied a straightforward fine-tuning approach to several prominent language models, including Phi 3. 5, Gemma 2, LLaMA 3. This work challenges the conventional wisdom that accurate semantic parsing requires specialised architectures and complex processing pipelines.

Notably, the LLaMA 3. 2 model attained a score of 0. 804 on a standard benchmark, matching the performance of existing complex systems and approaching the highest published results. Further analysis revealed nuanced strengths among the tested models; LLaMA 3. 2 consistently excelled in accurately capturing the semantic relationships within sentences, while Phi 3. 5 demonstrated a particular aptitude for constructing structurally valid graph representations, suggesting potential for hybrid approaches. This research highlights a significant step towards more accessible and efficient natural language understanding, paving the way for broader applications in areas like information extraction, machine translation, and dialogue systems.

LLMs Rival State-of-the-Art AMR Parsing

This research demonstrates that straightforwardly finetuning decoder-only Large Language Models (LLMs) achieves competitive performance on the task of Abstract Meaning Representation (AMR) parsing. The team evaluated four LLM architectures, Phi 3. 5, Gemma 2, LLaMA 3. 2 specifically attained results comparable to state-of-the-art AMR parsers, achieving a SMATCH F1 score of 0.
Further analysis revealed nuanced strengths among the models; Phi 3. 5 consistently excelled in maintaining structural validity within the generated AMR graphs, while LLaMA 3. 2 demonstrated superior semantic performance. The researchers observed a convergence in performance across all models at the highest levels of complexity, indicating a potential fundamental limit in current transformer-based approaches when handling extremely complex semantic structures. Future work could explore methods to address these limitations and further improve the structural coherence and semantic accuracy of LLM-based AMR parsers.

👉 More information
🗞 Evaluation of LLMs in AMR Parsing
🧠 ArXiv: https://arxiv.org/abs/2508.05028

Tags:

Abstract Meaning Representation AMR parsing Decoder-only Models LDC2020T02 LLaMA 3. 2 LLMs Phi 3. 5 semantic performance SMATCH structural validity

Quantum News

Finetuned LLMs Achieve Competitive Performance on Abstract Meaning Representation Parsing

LLMs Directly Generate Abstract Meaning Representations

Language Models Excel at Semantic Parsing

LLMs Rival State-of-the-Art AMR Parsing

Latest Posts by Quantum News:

SpaceX Prepares Initial Public Offering

ANELLO Photonics Partners with Q-CTRL to Address GPS-Denied Environments

IBM Reports High Failure Rate for Generative AI Pilots