Scientists are tackling the challenge of creating consistently engaging characters in role-playing interactions with large language models. Letian Peng, Yupeng Hou, and Kun Zhou, all from the University of California, San Diego, alongside Jingbo Shang et al., present a novel framework called Codified Finite-State Machines (CFSMs) to model character states more effectively. This research is significant because it moves beyond simply tracking actions, instead focusing on the underlying, often unstated, states that motivate a character’s behaviour. By automatically codifying textual character profiles into FSMs, and extending this to probabilistic models, the team demonstrates improved character consistency and more natural, varied interactions in both controlled experiments and real-world role-playing scenarios.
Automated character state modelling using large language models and codified finite-state machines
Scientists have developed a new framework for modelling character states in large language models, significantly improving consistency and engagement in role-playing scenarios. CFSMs automatically translate textual character profiles into functional finite-state machines using large language model-based coding, extracting key states and transitions to enforce character consistency.
The research introduces a method for automatically codifying character profiles into interpretable structures that maintain narrative coherence. This allows for uncertainty and variability in character behaviour, capturing nuanced dynamics and supporting more expressive responses.
Through both synthetic evaluations and real-world role-playing scenarios utilising established narratives, CFSM and CPFSM consistently outperformed baseline methods. The study demonstrates effectiveness not only in structured tasks but also in open-ended, stochastic state exploration, verifying the ability to maintain character consistency and believability.
Specifically, the framework was tested using scenarios inspired by “Super Mario”, where actions like grabbing a “Super Mushroom” trigger state changes, and stealth combat logic from “Call of Duty”. Evaluation on the Fandom Benchmark, comprising over 5,000 role-play scenes across 83 characters, revealed that CFSM and CPFSM improved behavioural consistency, transition traceability, and alignment with character profiles.
The probabilistic CPFSM model further enhanced expressiveness and realism by supporting multiple plausible actions with associated likelihoods, validating the strength of codified finite-state machines as a foundation for state modelling in language-driven role-play. The research begins by parsing textual character profiles to identify distinct states, including general “unactivated” and “other” states, before codifying transition rules as executable logic responsive to scene events.
This logic then processes each action during role-play to update the current state, ensuring subsequent outputs are grounded in a transparent and traceable trajectory. To extend CFSMs, the study introduces CPFSMs, which utilise a probabilistic transition matrix continuously updating state distributions based on available logits from a condition checker.
This mechanism captures subtle shifts in internal character dynamics and supports multinomial-distribution reactions with explicit likelihoods, increasing expressiveness and realism in open-ended role-play. Synthetic validation experiments, employing scenarios such as Mario’s power-up transitions and stealth combat logic from Call of Duty, were conducted to assess performance.
These experiments revealed that prompting LLMs directly with multi-action contexts often leads to confusion regarding states, misapplication of conditions, and the hallucination of transitions. Evaluation then extended to real-world narrative tasks using the Fandom Benchmark, encompassing over 5,000 role-play scenes across 83 characters.
Results demonstrate that both CFSM and CPFSM improve behavioural consistency, transition traceability, and alignment with character-defined profiles when compared to prompting and other state-modeling baselines. Further analysis involved an ablation study confirming the critical importance of explicit state registration, with its removal degrading action consistency.
A cost analysis revealed an O(n+k) codification strategy, assigning default conditions to all n states and overwriting only k profile-defined transitions, enabling scalable construction. Finally, a case study illustrated faithful tracking of dynamic states throughout episode progression, validating the framework’s efficacy.
Comparative performance of LLMs and CFSM on state transition prediction tasks
Logical error rates reached 2.9% per cycle under a random action selection policy during synthetic evaluations. To ensure balanced assessment, 400 test paths were sampled for the four Mario states, each comprising 100 paths terminating in each state. The evaluation involved prompting the LLM with textual transition rules, an initial state, and a sequence of actions, requiring it to predict the correct final state across path lengths from 1 to 10.
While gpt-4.1 and gpt-4.1-mini exhibited declining accuracy with increasing path length, particularly without chain-of-thought prompting, CFSM consistently achieved 100% accuracy with forward time proportional only to the number of transitions. Chain-of-thought prompting improved LLM accuracy by simulating the FSM, but incurred a 25-fold increase in forward time compared to CFSM, demonstrating the efficiency of the codified approach.
The Fandom Benchmark, constructed from character profiles and story summaries sourced from Fandom, was used to evaluate performance in behavior-centric role-playing. This benchmark spans six major artifacts, Haruhi, K-On., JOJO, FMA, AGOT, and ATLA, covering 83 characters and 5,141 scenes across diverse media genres.
Comparisons were made against Vanilla, Textual Profile, Codified Profile, PromptTrans, and Character Updating baselines, all utilising the same role-playing model and discriminator. Vanilla, relying solely on internal knowledge, served as the lower bound for performance, while Textual Profile prompted the LLM with the character’s textual profile. These frameworks automatically construct finite-state machines from textual character profiles, effectively capturing key states and transitions to maintain character consistency.
By modelling transitions as probability distributions, CPFSMs further account for uncertainty and variability in character behaviour. Evaluations using both synthetic data and established role-playing scenarios demonstrate that CFSMs and CPFSMs outperform standard prompting-based methods. Specifically, the frameworks successfully tracked latent state trajectories in examples such as the character development of Iggy in JoJo’s Bizarre Adventure and Azusa’s evolving role in K-ON., guiding coherent actions aligned with the character’s current state.
The authors acknowledge limitations in the current approach, noting that CFSMs are initially built from pre-written profiles rather than directly from narrative plots. Future research will explore automatically constructing these machines from narratives, incorporating continuous dynamics through numeric mechanisms like health points, and enabling dynamic updates to the state set to allow characters to evolve and acquire new traits. These developments aim to create more adaptive and realistic character modelling systems.
👉 More information
🗞 Codified Finite-state Machines for Role-playing
🧠 ArXiv: https://arxiv.org/abs/2602.05905
