Artificial Intelligence

Preferred on Google

Erik Hanchett: Cut AI Agent Token Costs

AWS Developer Advocate Erik Hanchett shares five essential strategies to cut AI agent token costs, including caching prompts, routing by difficulty, and managing conversation history.

Jun 28 at 11:01 PM10 min read

Erik Hanchett of AWS presenting on reducing AI agent token costs. — Erik Hanchett, Developer Advocate at AWS, discusses strategies for optimizing AI agent token consumption.· AI Engineer

In the rapidly evolving world of AI agents, managing operational costs is becoming as critical as the agent's functionality. Erik Hanchett, a Developer Advocate at AWS, delivered a concise presentation titled "Your Agent Is Wasting Tokens and You Don't Know It," highlighting five key strategies to significantly reduce token expenditure and optimize AI agent performance.

Erik Hanchett: Cut AI Agent Token Costs - AI Engineer — Erik Hanchett: Cut AI Agent Token Costs — from AI Engineer

Visual TL;DR. Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Unseen Token Waste solved by Offload Big Tool Results. Unseen Token Waste solved by Cap Tool Loops. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs. Offload Big Tool Results leads to Cut Token Costs. Cap Tool Loops leads to Cut Token Costs. Cut Token Costs enables Optimize Performance.

Related startups

Erik Hanchett: AWS Developer Advocate sharing AI agent cost-saving strategies
Unseen Token Waste: AI agents silently overconsume tokens, increasing operational costs
Cache System Prompt: Store and reuse the initial system prompt to avoid repetition
Route by Difficulty: Direct simpler tasks to cheaper models, complex to powerful ones
Offload Big Tool Results: Process large tool outputs separately to save agent tokens
Cap Tool Loops: Limit the number of times tools can be called in a loop
Cut Token Costs: Significantly reduce AI agent operational expenditure
Optimize Performance: Improve AI agent efficiency and scalability

Visual TL;DRQuickExplainDeeper

Who Is Erik Hanchett?

Erik Hanchett is a Senior Developer Advocate at Amazon Web Services (AWS). In his role, he focuses on empowering developers to build and deploy applications, often by explaining complex cloud services and best practices. His expertise lies in making advanced technologies accessible and practical for development teams.

The Problem: Unseen Token Waste

Hanchett immediately dives into a common pitfall for developers building AI agents: the silent and often unnoticed overconsumption of tokens. This waste directly translates to higher operational costs and can impact the scalability and economic viability of AI-powered applications. He frames the presentation around five practical "fixes" to address this issue.

Fix 1: Cache the System Prompt

The first strategy addresses the repetitive sending of system prompts. Hanchett demonstrates a code snippet showing how to define an agent using AWS's Bedrock models. By setting a cache_prompt="default" parameter for the system prompt, the agent can store and reuse this information. This means the full, often lengthy, system prompt is only sent on the initial call. Subsequent interactions will use the cached version, saving tokens on every turn.

Fix 2: Route by Difficulty

Not all tasks require the most powerful and expensive language models. Hanchett suggests implementing a routing mechanism that dispatches tasks based on their complexity. His example shows a Python function pick_model(task) that checks if a task is simple. If it is, it returns a less expensive model like "claude-haiku." For more complex tasks, it defaults to a more capable, but costlier, model such as "claude-sonnet." This dynamic model selection ensures that developers are not overspending on simpler requests.

Fix 3: Offload Big Tool Results

When AI agents interact with tools, the output from these tools can be substantial, consuming many tokens. Hanchett highlights the inefficiency of sending the full, raw output of a tool back into the agent's context. For instance, a tool that retrieves a report might return 10,000 tokens worth of data. Instead, he advocates for offloading or summarizing these large results. The example shows a fetch_report function that retrieves data, stores it, and then uses a summarize function with a reference. Only the summarized output is then passed back to the agent, drastically reducing the token count sent in subsequent processing steps.

Fix 4: Cap Your Tool Loops

A critical issue that can lead to runaway token costs is when an agent gets stuck in an infinite loop of tool calls. Hanchett demonstrates how to prevent this by setting a max_iterations parameter on the agent. For example, setting max_iterations=8 will halt the agent's execution after eight consecutive tool calls, regardless of whether the task is complete. This acts as a safeguard against excessive token consumption from malfunctioning or looping agent behavior.

Fix 5: Trim the History

For conversational agents, maintaining a long history of interactions is crucial for context, but it can also lead to massive token usage. Hanchett introduces the concept of a "sliding window conversation manager." His example shows how to use a SlidingWindowConversationManager with a specified window_size=10. This approach means that only the last 10 turns of the conversation are kept in context and sent to the language model. Older messages are effectively dropped, preventing the context from growing indefinitely and significantly reducing token costs in long conversations.

Recap: Five Fixes for a Smaller Bill

Hanchett summarizes the five key strategies for optimizing AI agent token usage:

Cache the System Prompt: Stop resending the static prompt every turn.
Route by Difficulty: Use a cheaper model for simple calls.
Offload Big Tool Results: Keep 10k-token blobs out of context by summarizing them.
Cap Your Tool Loops: Implement max_iterations to prevent runaway calls.
Trim the History: Use a window or summarize old turns to manage context length.

By implementing these practical measures, developers can ensure their AI agents are not only effective but also cost-efficient, making them more sustainable for production environments.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Erik Hanchett #AWS #AI Agents #Large Language Models #Token Optimization #Cost Reduction #Developer Advocate #Cloud Computing

AI Daily Digest

Get the most important AI news daily.

+40k readers