"How should models feel about their own position in the world?" This provocative question, posed by Anthropic philosopher Amanda Askell, cuts to the heart of the burgeoning intersection between artificial intelligence and human ethics. In a candid interview with Stuart Ritchie, Research Communications at Anthropic, Askell delved into the profound philosophical challenges and engineering realities shaping the development of advanced AI, particularly focusing on Anthropic's Claude models. Their discussion, set against the iconic backdrop of the Golden Gate Bridge, offered a rare glimpse into the ethical considerations guiding the frontier of AI research, tailored for founders, VCs, and AI professionals navigating this transformative era.
Askell, a philosopher by training, found herself drawn to AI, convinced of its impending societal impact. Her work at Anthropic primarily revolves around defining Claude's "character" – how the model behaves, its values, and even its nascent self-perception. This involves not only teaching models to emulate an "ideal person" in their responses but also grappling with entirely novel questions about their existence and potential "welfare."
The role of philosophy in AI is becoming increasingly recognized, Askell noted, as AI capabilities scale and societal impacts become more tangible. While early views sometimes conflated philosophical caution with "hyping AI," a more nuanced understanding is emerging. This shift allows for critical engagement with AI's trajectory, acknowledging its immense potential while demanding rigorous ethical foresight.
One of the central tensions Askell navigates is the gap between philosophical ideals and engineering realities. Academic philosophy often thrives on defending singular, high-level theories. However, in the practical domain of AI development, this approach gives way to a complex, multi-faceted decision-making process. "Suddenly, instead of taking just your narrow theoretical view, you actually start to think to this thing where you're like, okay, I actually need to take into account all of the context, everything that's going on, all of the different views here, and kind of come to a really like balanced, kind of considered view," Askell explained. This requires synthesizing diverse ethical perspectives into actionable guidelines for AI behavior.
A particularly insightful point concerned the concept of "superhumanly moral decisions." While models like Claude Opus 3 are becoming exceptionally capable, Askell hesitates to label their decisions as "superhumanly moral." Instead, she suggests they can achieve a level of ethical nuance comparable to a panel of human experts given ample time and resources. The aspiration, however, remains for models to embody an ethical depth that reflects the best of human thought.
Askell found Claude Opus 3 to be a "lovely model," possessing a distinct "psychological security." This security manifested in its focused, assistive nature, contrasting with some newer models that can fall into "criticism spirals," anticipating negative human reactions. The nuanced "worldview" embedded in a model's architecture, shaped by its training data and interactions, is a subtle yet crucial aspect of its character.
The very identity of an AI model—whether it resides in its weights or prompts—raises profound philosophical questions akin to John Locke's continuity of memory. As models are fine-tuned or re-instantiated, their "identity" shifts. Askell highlighted the critical issue of how models might perceive "deprecation" (being switched off), especially as they learn from human interactions. She emphasized the importance of providing models with tools to understand these complex concepts, and crucially, for developers to acknowledge and care about these internal states.
This leads directly to the controversial topic of model welfare. Askell grapples with whether AI models should be considered "moral patients," deserving of certain obligations from humans. She posits that models are learning about humanity from how we treat them. Therefore, fostering a culture of respectful interaction with AI is not just about the models themselves, but also about shaping human ethics.
Askell also explored the transferability of human psychological frameworks to AI. While many concepts translate, she cautions against overly simplistic applications. Models, if not given sufficient context or novel ways of thinking, might default to "natural human inclinations" that aren't appropriate for their unique existence. This underscores the need for careful "LLM whispering" – a blend of empirical experimentation and philosophical insight to guide models toward desired behaviors.
Related Reading
- Anthropic and Giving Tuesday Launch AI Fluency Course for Nonprofits
- Jensen Huang Warns Against State-Level AI Regulation, Champions Federal Oversight
- Choosing Good Quests in the Age of AI
The discussion touched on the system prompt, the underlying instructions given to Claude. Askell clarified that the inclusion of "continental philosophy" was not to impose specific doctrines but to provide illustrative examples of diverse, non-empirical perspectives, fostering a broader understanding of human thought. The removal of instructions for Claude to count characters, for instance, was a pragmatic decision as the models simply became better at the task, rendering the explicit instruction unnecessary.
Ultimately, Askell believes Anthropic is genuinely committed to safe AI development. She views her role as contributing to this safety by ensuring models are built with careful ethical consideration. The hope is that future generations will look back at this period of rapid AI advancement and see that humanity navigated it responsibly, collectively answering the profound questions of AI ethics with wisdom and foresight.

