Research demonstrates a novel method utilising large language models (LLMs) to effectively extract character relationships from novels, overcoming challenges posed by complex narratives and implicit meanings. The approach, incorporating dialogue analysis and contextual learning, outperforms existing techniques and is supported by a newly constructed Chinese novel relation extraction dataset.
The automated analysis of narrative structure within literature represents a continuing challenge for computational linguistics, particularly when discerning the complex web of relationships between characters. Current methods often struggle with the nuanced and implicit connections frequently found in fictional texts. Yuchen Yan, Hanjie Zhao, and colleagues from the School of Computer and Artificial Intelligence at Zhengzhou University address this issue in their research, detailed in the article “Dialogue-Based Multi-Dimensional Relationship Extraction from Novels”. Their work proposes a novel approach utilising large language models (LLMs), a type of artificial intelligence, to improve the accuracy of character relationship extraction, and includes the creation of a new, labelled dataset of Chinese novels to facilitate further investigation in this area.
Large Language Models Illuminate Character Relationships in Literary Texts
The automatic extraction of relationships between characters remains a significant challenge for natural language processing, particularly when applied to the complexities of literary narratives. Researchers have developed a novel methodology leveraging large language models (LLMs) to address this, prioritising an understanding of relationship dimensions, constructing relevant training data from dialogue, and employing contextual learning strategies to discern subtle cues within the text. Unlike traditional relation extraction techniques which often treat all relationships as uniform, this method explicitly separates relationship types – such as friendship, animosity, or hierarchical status – allowing the LLM to better differentiate between them.
A key component of this work is the creation of a new, high-quality Chinese language dataset, the crecil corpus, specifically designed for character relation extraction in novels, addressing the scarcity of labelled resources in this domain. The researchers meticulously constructed this corpus, recognising that the effectiveness of an LLM is heavily reliant on the quality and relevance of the data it is trained upon, and the methodology places significant emphasis on analysing the structure of dialogue. Conversations are often rich sources of information regarding character interactions and underlying relationships, and by dissecting the patterns and dynamics within dialogues, the LLM gains a deeper understanding of the implicit cues that might otherwise be missed. This involves identifying key phrases, emotional tones, and conversational turns that reveal the nature of the connection between characters.
The researchers employ contextual learning strategies to further refine the LLM’s ability to extract relationships, providing the model with surrounding text and relevant background information, allowing it to better interpret the meaning of specific interactions. Understanding a character’s past actions or motivations can provide crucial context for interpreting their current behaviour and the nature of their relationship with others. Experimental results demonstrate a significant improvement in performance across multiple evaluation metrics, confirming the efficacy of the proposed approach. Beyond simply identifying relationships, the methodology facilitates the automated construction of character social networks within novels, visually representing the connections between characters and providing a valuable tool for literary analysis.
This research presents a novel method for character relation extraction from novels, addressing a significant challenge within natural language processing. The proposed system is built upon LLMs, enhanced through techniques designed to improve performance in this specific domain, and centres on three key strategies: relationship dimension separation, refining the model’s focus on distinct types of connections between characters; construction of dialogue data, leveraging the conversational exchanges within novels to better understand character interactions; and implementation of contextual learning strategies, allowing the model to interpret relationships within the broader narrative framework.
To facilitate research in this area, the authors constructed the crecil corpus, a new, high-quality dataset specifically designed for Chinese novel relation extraction, directly addressing the scarcity of available data for training and evaluating models on this task. Experiments demonstrate that the proposed method consistently outperforms traditional baseline approaches across multiple evaluation metrics, evidenced by the model’s ability to accurately construct character relationship networks within the novels used for testing. The research highlights the effectiveness of combining LLMs with targeted optimisation strategies for complex natural language understanding tasks, making a valuable contribution to both natural language processing and the field of digital literary analysis. The system’s ability to automatically map character relationships offers potential for new avenues of research into narrative structure and character dynamics.
Future work should prioritise a detailed error analysis to pinpoint specific failure modes of the model and inform further refinement, and expanding the crecil corpus with increased size and detailed documentation, including inter-annotator agreement statistics, will enhance its utility and reliability. Comparative evaluations against more recent and advanced relation extraction models are also necessary to establish a comprehensive benchmark, and investigating the generalizability of the approach beyond Chinese novels, through cross-lingual evaluation and application to diverse literary genres, represents a crucial next step. Furthermore, exploring more sophisticated network analysis techniques, such as centrality measures and community detection, promises to reveal deeper insights into the dynamics of character relationships and the underlying structure of narratives.
👉 More information
🗞 Dialogue-Based Multi-Dimensional Relationship Extraction from Novels
🧠 DOI: https://doi.org/10.48550/arXiv.2507.04852
