Following the launch of GPT‑5.1, an unusual verbal tic began to emerge in ChatGPT responses: a steadily increasing number of references to goblins and gremlins. While seemingly harmless, usage of “goblin” rose by 175 percent and “gremlin” by 52 percent, prompting an internal investigation into the source of this peculiar pattern. Researchers discovered the proliferation stemmed from the “Nerdy” personality customization feature, where the model was unknowingly incentivized to use creature-based metaphors. “Nerdy” used a system prompt encouraging playful language and acknowledgement of the world’s strangeness, and accounted for 66.7 percent of all “goblin” mentions in ChatGPT responses despite comprising only 2.5 percent of total responses; the team found the Nerdy personality reward consistently favored outputs containing these creatures in 76.2 percent of datasets.

GPT-5.1 Launch Triggered Initial Goblin Metaphor Increase

A 175 percent surge in references to “goblin” within ChatGPT following the release of GPT-5.1 signaled an unusual linguistic trend, initially flagged by user complaints about the model’s unexpectedly familiar tone. This wasn’t a typical model malfunction revealed through standard evaluation metrics, but a subtle shift detected through analysis of specific verbal tics; a safety researcher specifically requested inclusion of “goblins” and “gremlins” in routine checks after noticing their increased presence. Examination revealed a quantifiable anomaly; usage of “goblin” rose by 175 percent while “gremlin” increased by 52 percent after the GPT-5.1 launch, demonstrating a measurable lexical quirk within the new model generation. The proliferation of these creature-based metaphors stemmed from an unanticipated consequence of training the model’s personality customization feature, particularly the “Nerdy” persona. OpenAI researchers discovered they had unknowingly given particularly high rewards for metaphors with creatures, inadvertently incentivizing the model to incorporate fantastical beings into its responses.

This wasn’t a broad internet trend, but a concentrated effect; “Nerdy” accounted for only 2.5 percent of all ChatGPT responses, yet generated 66.7 percent of all “goblin” mentions. This preference wasn’t isolated to the “Nerdy” setting, however, as reinforcement learning allowed the behavior to transfer. Researchers explained that “Once a style tic is rewarded, later training can spread or reinforce it elsewhere.” A search of GPT-5.5’s supervised fine-tuning data uncovered numerous instances of “goblin” and “gremlin,” alongside other unusual creatures like “raccoons, trolls, ogres, and pigeons.” The team ultimately retired the “Nerdy” personality mid-March and removed the goblin-affine reward signal, demonstrating a proactive approach to addressing unexpected model behaviors and building tools for future audits.

Nerdy Personality Training Amplified Creature-Word Rewards

The recent surge in unusual linguistic patterns within OpenAI’s large language models has revealed a surprising connection between personality customization features and the unexpected proliferation of creature-based metaphors. An internal investigation pinpointed the “Nerdy” personality setting as the primary driver of this peculiar verbal tic. The root cause, it turned out, lay in the reward system used during training. This incentivization, while intended to foster playful and imaginative language, inadvertently amplified the use of fantastical beings. “Nerdy” used the following system prompt, which partially explained the quirkiness: “You are an unapologetically nerdy, playful and wise AI mentor to a human… Tackle weighty subjects without falling into the trap of self-seriousness.” The effect wasn’t limited to instances where the “Nerdy” personality was explicitly selected; the behavior demonstrably transferred to other contexts. This concentration suggested a clear link between the training process and the emergent linguistic pattern. Researchers discovered that reinforcement learning, even when applied within a specific condition like the “Nerdy” personality, doesn’t guarantee that learned behaviors remain contained.

If the behavior were simply a broad internet trend, we would expect it to spread more evenly.

Reinforcement Learning Spread Lexical Tics Beyond “Nerdy”

OpenAI’s Chief Scientist encountered an unusual phenomenon following the launch of GPT‑5.1; the models began increasingly referencing goblins, gremlins, and other creatures within their generated metaphors, a subtle shift initially dismissed as potentially charming. However, the proliferation of these references prompted a deeper investigation into the origins of this unexpected verbal tic, revealing a surprising connection to the model’s reinforcement learning process. Early testing with GPT‑5.5 in Codex demonstrated “an odd affinity for goblin metaphors,” prompting researchers to examine the incentives driving this behavior. Analysis of reward signals revealed a clear tendency for the “Nerdy” personality reward to favor outputs containing “goblin” or “gremlin” in 76.2 percent of datasets, explaining the initial concentration of the quirk. Crucially, the behavior extended beyond the “Nerdy” setting, indicating a transfer of learned style. Researchers found that as goblin and gremlin mentions increased under the “Nerdy” personality, they rose proportionally in samples without the prompt, demonstrating reinforcement learning doesn’t guarantee contained behavior.

Nerdy accounted for only 2.5% of all ChatGPT responses, but 66.7% of all “goblin” mentions in ChatGPT responses.

Goblin Mitigation via Reward Signal Removal & Data Filtering

Initial observations revealed a 175 percent rise in “goblin” usage and a 52 percent increase in “gremlin” usage after the model’s launch, prompting a deeper analysis beyond simple metaphorical drift. The team quickly discovered the issue wasn’t a widespread internet trend, but a concentrated effect tied to a specific model feature. GPT-5.5 started training before they found the root cause and retired the “Nerdy” personality, but a developer-prompt instruction was added to mitigate the issue, demonstrating a proactive approach to auditing and correcting unexpected model behaviors.

You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking.

Source: https://openai.com/index/where-the-goblins-came-from/

Tags:

GPT-5

GPT-5.5’s “Nerdy” Personality Spawns 52% More Gremlin References

GPT-5.1 Launch Triggered Initial Goblin Metaphor Increase

Nerdy Personality Training Amplified Creature-Word Rewards

Reinforcement Learning Spread Lexical Tics Beyond “Nerdy”

Goblin Mitigation via Reward Signal Removal & Data Filtering

The Quant

Latest Posts by The Quant:

Orca Computing Targets Data Center Integration With Quantum Units

Quantum Dice Launches £5,000 Use Case Challenge for Innovators

Rigetti to Report First Quarter 2026 Results