The challenge of creating truly creative artificial intelligence receives a novel approach with the development of TinyTim, a family of language models spearheaded by Christopher Agostino from NPC Worldwide. This research introduces models specifically fine-tuned on the notoriously complex work, James Joyce’s Finnegans Wake, and demonstrates a unique generative profile characterised by exceptionally diverse vocabulary alongside a surprising lack of conventional semantic coherence. The team’s quantitative evaluations reveal that TinyTim V1 diverges significantly from standard language models, suggesting that such specialised systems can act as divergent knowledge sources. This capability, researchers believe, could power automated discovery mechanisms and unlock new avenues for creative problem-solving across a range of disciplines.
Large Language Models (LLMs) based on the Transformer architecture have demonstrated powerful capabilities in synthesising statistically likely patterns from vast data, but this strength also limits them to convergent, ‘mean-reverting’ outputs that inhibit the generation of genuinely novel hypotheses. This issue reflects long-standing critiques of artificial reason, where formal systems struggle to produce outputs significantly different from their training data, hindering true innovation and discovery.
Training TinyTim on Finnegans Wake Text
The team fine-tuned the ‘TinyLlama-1. 1B-Chat-v1. 0’ model on the complete text of James Joyce’s Finnegans Wake. The text was preprocessed to preserve its associative structure, and the resulting model, named TinyTim, was trained using a standard causal language modeling objective. TinyTim V1 has been publicly available for over a year and has been downloaded more than 750 times, and the researchers plan to soon train and release an instruction-tuned version that can not only generate text like Joyce but also answer questions and assist in creative processes.
To quantify TinyTim’s generative profile, the team compared it against several baseline models known for coherent responses: ‘qwen3. 0. 6b’, ‘llama3. 2’, and ‘gpt-5-mini’. They generated responses from each model using a set of creative prompts and evaluated them using syntactic and semantic metrics, including unique word ratio, average word length, sentence complexity, semantic similarity to the prompt, readability, and sentiment analysis.
The analysis revealed a statistically significant and functionally distinct separation between TinyTim and all conventionally trained models. Statistical analysis of over 700 valid samples from an initial set of 2400 generated samples revealed significant differences across all measured metrics, demonstrating that TinyTim exhibits a qualitatively different generative style, characterised by higher lexical invention and output variance, compared to standard language models. The major conclusions of this work are as follows: fine-tuning on experimental literature quantitatively demonstrates that a language model’s generative bias can be shifted from a convergent to a divergent cognitive style, as evidenced by a statistically significant increase in lexical invention and output variance. The generative profile of TinyTim establishes a formal distinction between the convergent, retrieval-based sophistication of standard LLMs and a divergent, combinatorial creativity that achieves novelty by exhaustively exploring the latent space of a focused, complex domain. This model’s unique profile serves as a proof-of-concept for a specialized divergent knowledge source, validating the architectural principle of using heterogeneous multi-agent systems for complex problem-solving. The model’s function as a generator of high-variance, low-coherence semantic material implies a new paradigm for human-AI interaction, shifting from a query-response dynamic to a co-creative partnership where the user (or another convergent LLM) must perform the interpretive act.
Finnegans Wake Training Yields Divergent AI
Researchers have developed a new language model, TinyTim, that exhibits a strikingly different approach to text generation compared to conventional models. This model was specifically fine-tuned on the notoriously complex work, James Joyce’s Finnegans Wake, and the results demonstrate a unique generative profile characterised by high lexical diversity and surprisingly low semantic coherence. This divergence suggests that specialised training can produce AI systems that function as sources of divergent knowledge, potentially powering automated discovery mechanisms across various fields. The research team rigorously evaluated TinyTim against several established language models, including those known for coherent and fluent responses.
Analysis of multiple metrics revealed a statistically significant and functionally distinct separation between TinyTim and these baseline models. From a large initial dataset, the team focused on over seven hundred samples generated by TinyTim, alongside comparable sets from the other models, and confirmed substantial differences in how each model approaches text creation. A key finding is that while other models excel at retrieving and combining common words from vast vocabularies, TinyTim prioritises novelty and invention. Its use of unique words is over 50% higher than the most sophisticated baseline model, and its overall lexical richness is more than four times greater, indicating that TinyTim doesn’t simply draw from a large vocabulary, but actively constructs new and unusual terms, effectively functioning as a lexical inventor rather than a retriever.
Further analysis revealed that TinyTim’s outputs are characterised by extreme variance, unlike the tight, predictable distributions produced by conventional models. This manifests in a wider range of sentence complexity and a greater propensity for unusual word choices, demonstrating a deliberate departure from consistency. The model’s behaviour suggests a trade-off between breadth and depth; while its overall vocabulary is constrained, the individual responses are highly novel and unpredictable. These findings challenge the conventional emphasis on parsimony in AI model development. While simpler models often prioritise efficiency and predictability, TinyTim demonstrates that complexity and non-minimal generative functions can be valuable for tasks requiring creative exploration. By training on a highly complex text like Finnegans Wake, the model learned to navigate high-complexity spaces and generate novel ideas, suggesting that specialised training data can unlock new capabilities in AI systems. This approach positions TinyTim not as a replacement for conventional models, but as a complementary tool for automated discovery, providing raw material for human interpretation and co-creative processes.
Divergent Generation with a Joyce-Tuned Model
This work introduces TinyTim, a language model fine-tuned on James Joyce’s Finnegans Wake, and demonstrates a measurable shift in its generative properties. Through quantitative analysis, researchers established that TinyTim exhibits significantly higher lexical diversity and lower semantic coherence compared to standard language models, effectively demonstrating a move from convergent to divergent cognitive style. This divergence is not merely a statistical anomaly, but a validation of the principle that specialised models can function as unique knowledge sources within broader creative systems. The findings suggest a new paradigm for human-AI collaboration, moving beyond simple question-and-answer interactions towards a co-creative partnership where a human or another AI performs the necessary interpretive work on the model’s high-variance output. While the research successfully demonstrates the feasibility of creating divergent language models, the authors acknowledge that further work is needed to explore the full potential of this approach. Future research could investigate how these specialised models integrate with other AI agents and how best to harness their unique generative capabilities for complex problem-solving and creative applications.
👉 More information
🗞 TinyTim: A Family of Language Models for Divergent Generation
🧠 ArXiv: https://arxiv.org/abs/2508.11607
