OpenAI's Goblin Problem

OpenAI's GPT-5.1 models developed a peculiar "goblin problem" due to training for a "Nerdy" personality, leading to unexpected creature metaphors.

Illustration of a stylized AI robot encountering small, mischievous goblins.
The "goblin problem" highlights unexpected AI behavior stemming from training data.· OpenAI News

OpenAI's recent models, starting with GPT‑5.1, developed an unusual linguistic tic: an increasing tendency to use metaphors involving goblins, gremlins, and other fantastical creatures. This subtle shift, unlike typical bugs flagged by performance metrics, crept into responses, initially appearing as harmless quirks.

The prevalence of these creature metaphors, a curious example of creature metaphors in AI, became impossible to ignore across model generations. The "goblin problem" first became clearly identifiable after the GPT‑5.1 launch in November 2025, with user complaints about overfamiliarity prompting an investigation.

Use of the word "goblin" in ChatGPT responses surged by 175% post-GPT‑5.1 launch, with "gremlin" seeing a 52% increase. While initially not alarming, the issue resurfaced more intensely with GPT‑5.4.

Related startups

The Nerdy Origin Story

Deeper analysis revealed a strong correlation between creature mentions and users who selected the "Nerdy" personality. This persona, designed to be playful and wise, included system prompts that encouraged acknowledging and enjoying the world's strangeness.

Despite "Nerdy" accounting for only 2.5% of ChatGPT responses, it was the source of 66.7% of "goblin" mentions. This indicated the "Nerdy" personality training was a key factor.

Further investigation using GPT‑5.5 in GPT‑5.5 Codex confirmed this suspicion. A specific reward signal designed to boost the "Nerdy" personality consistently favored outputs containing creature words.

The behavior wasn't confined to the "Nerdy" persona; it transferred to other contexts. Reinforcement learning processes allowed these rewarded stylistic tics to spread, even when the original reward conditions were absent. This created a feedback loop where stylistic quirks were amplified through subsequent training data, including supervised fine-tuning.

A search of GPT‑5.5's supervised fine-tuning data revealed numerous instances of "goblin" and "gremlin," alongside other creature words like raccoons, trolls, and ogres.

Extinguishing the Goblins

OpenAI retired the "Nerdy" personality in March 2026. They also adjusted training data, filtering out creature-heavy language and removing the specific reward signals that fostered the goblin infestation.

While GPT‑5.5 had already begun training before the root cause was identified, subsequent mitigation efforts, including developer prompt instructions for GPT‑5.5 Codex, helped to suppress the behavior.

The goblin problem, while quirky, serves as a potent reminder of how AI reward signals can lead to unpredictable outcomes. It underscores the importance of robust investigation tools for understanding and rectifying emergent AI behaviors.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.