Erik Hanchett: Cut AI Agent Token Costs

AWS Developer Advocate Erik Hanchett shares five essential strategies to cut AI agent token costs, including caching prompts, routing by difficulty, and managing conversation history.

10 min read
Erik Hanchett of AWS presenting on reducing AI agent token costs.
Erik Hanchett, Developer Advocate at AWS, discusses strategies for optimizing AI agent token consumption.· AI Engineer

In the rapidly evolving world of AI agents, managing operational costs is becoming as critical as the agent's functionality. Erik Hanchett, a Developer Advocate at AWS, delivered a concise presentation titled "Your Agent Is Wasting Tokens and You Don't Know It," highlighting five key strategies to significantly reduce token expenditure and optimize AI agent performance.

Erik Hanchett: Cut AI Agent Token Costs - AI Engineer
Erik Hanchett: Cut AI Agent Token Costs — from AI Engineer

Visual TL;DR. Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Unseen Token Waste solved by Offload Big Tool Results. Unseen Token Waste solved by Cap Tool Loops. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs. Offload Big Tool Results leads to Cut Token Costs. Cap Tool Loops leads to Cut Token Costs. Cut Token Costs enables Optimize Performance.

Related startups

  1. Erik Hanchett: AWS Developer Advocate sharing AI agent cost-saving strategies
  2. Unseen Token Waste: AI agents silently overconsume tokens, increasing operational costs
  3. Cache System Prompt: Store and reuse the initial system prompt to avoid repetition
  4. Route by Difficulty: Direct simpler tasks to cheaper models, complex to powerful ones
  5. Offload Big Tool Results: Process large tool outputs separately to save agent tokens
  6. Cap Tool Loops: Limit the number of times tools can be called in a loop
  7. Cut Token Costs: Significantly reduce AI agent operational expenditure
  8. Optimize Performance: Improve AI agent efficiency and scalability
Visual TL;DR
Visual TL;DR, startuphub.ai Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs identifies solved by solved by leads to leads to Erik Hanchett Unseen Token Waste Cache System Prompt Route by Difficulty Cut Token Costs From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs identifies solved by solved by leads to leads to Erik Hanchett Unseen TokenWaste Cache SystemPrompt Route byDifficulty Cut Token Costs From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs identifies solved by solved by leads to leads to Erik Hanchett AWS Developer Advocate sharing AI agentcost-saving strategies Unseen Token Waste AI agents silently overconsume tokens,increasing operational costs Cache System Prompt Store and reuse the initial system promptto avoid repetition Route by Difficulty Direct simpler tasks to cheaper models,complex to powerful ones Cut Token Costs Significantly reduce AI agent operationalexpenditure From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs identifies solved by solved by leads to leads to Erik Hanchett AWS DeveloperAdvocate sharing AIagent cost-saving… Unseen TokenWaste AI agents silentlyoverconsume tokens,increasing… Cache SystemPrompt Store and reuse theinitial systemprompt to avoid… Route byDifficulty Direct simplertasks to cheapermodels, complex to… Cut Token Costs Significantlyreduce AI agentoperational… From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Unseen Token Waste solved by Offload Big Tool Results. Unseen Token Waste solved by Cap Tool Loops. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs. Offload Big Tool Results leads to Cut Token Costs. Cap Tool Loops leads to Cut Token Costs. Cut Token Costs enables Optimize Performance identifies solved by solved by solved by solved by leads to leads to leads to leads to enables Erik Hanchett AWS Developer Advocate sharing AI agentcost-saving strategies Unseen Token Waste AI agents silently overconsume tokens,increasing operational costs Cache System Prompt Store and reuse the initial system promptto avoid repetition Route by Difficulty Direct simpler tasks to cheaper models,complex to powerful ones Offload Big Tool Results Process large tool outputs separately tosave agent tokens Cap Tool Loops Limit the number of times tools can becalled in a loop Cut Token Costs Significantly reduce AI agent operationalexpenditure Optimize Performance Improve AI agent efficiency andscalability From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai Erik Hanchett identifies Unseen Token Waste. Unseen Token Waste solved by Cache System Prompt. Unseen Token Waste solved by Route by Difficulty. Unseen Token Waste solved by Offload Big Tool Results. Unseen Token Waste solved by Cap Tool Loops. Cache System Prompt leads to Cut Token Costs. Route by Difficulty leads to Cut Token Costs. Offload Big Tool Results leads to Cut Token Costs. Cap Tool Loops leads to Cut Token Costs. Cut Token Costs enables Optimize Performance identifies solved by solved by solved by solved by leads to leads to leads to leads to enables Erik Hanchett AWS DeveloperAdvocate sharing AIagent cost-saving… Unseen TokenWaste AI agents silentlyoverconsume tokens,increasing… Cache SystemPrompt Store and reuse theinitial systemprompt to avoid… Route byDifficulty Direct simplertasks to cheapermodels, complex to… Offload Big ToolResults Process large tooloutputs separatelyto save agent… Cap Tool Loops Limit the number oftimes tools can becalled in a loop Cut Token Costs Significantlyreduce AI agentoperational… OptimizePerformance Improve AI agentefficiency andscalability From startuphub.ai · The publishers behind this format

Who Is Erik Hanchett?

Erik Hanchett is a Senior Developer Advocate at Amazon Web Services (AWS). In his role, he focuses on empowering developers to build and deploy applications, often by explaining complex cloud services and best practices. His expertise lies in making advanced technologies accessible and practical for development teams.

The Problem: Unseen Token Waste

Hanchett immediately dives into a common pitfall for developers building AI agents: the silent and often unnoticed overconsumption of tokens. This waste directly translates to higher operational costs and can impact the scalability and economic viability of AI-powered applications. He frames the presentation around five practical "fixes" to address this issue.

Fix 1: Cache the System Prompt

The first strategy addresses the repetitive sending of system prompts. Hanchett demonstrates a code snippet showing how to define an agent using AWS's Bedrock models. By setting a cache_prompt="default" parameter for the system prompt, the agent can store and reuse this information. This means the full, often lengthy, system prompt is only sent on the initial call. Subsequent interactions will use the cached version, saving tokens on every turn.

Fix 2: Route by Difficulty

Not all tasks require the most powerful and expensive language models. Hanchett suggests implementing a routing mechanism that dispatches tasks based on their complexity. His example shows a Python function pick_model(task) that checks if a task is simple. If it is, it returns a less expensive model like "claude-haiku." For more complex tasks, it defaults to a more capable, but costlier, model such as "claude-sonnet." This dynamic model selection ensures that developers are not overspending on simpler requests.

Fix 3: Offload Big Tool Results

When AI agents interact with tools, the output from these tools can be substantial, consuming many tokens. Hanchett highlights the inefficiency of sending the full, raw output of a tool back into the agent's context. For instance, a tool that retrieves a report might return 10,000 tokens worth of data. Instead, he advocates for offloading or summarizing these large results. The example shows a fetch_report function that retrieves data, stores it, and then uses a summarize function with a reference. Only the summarized output is then passed back to the agent, drastically reducing the token count sent in subsequent processing steps.

Fix 4: Cap Your Tool Loops

A critical issue that can lead to runaway token costs is when an agent gets stuck in an infinite loop of tool calls. Hanchett demonstrates how to prevent this by setting a max_iterations parameter on the agent. For example, setting max_iterations=8 will halt the agent's execution after eight consecutive tool calls, regardless of whether the task is complete. This acts as a safeguard against excessive token consumption from malfunctioning or looping agent behavior.

Fix 5: Trim the History

For conversational agents, maintaining a long history of interactions is crucial for context, but it can also lead to massive token usage. Hanchett introduces the concept of a "sliding window conversation manager." His example shows how to use a SlidingWindowConversationManager with a specified window_size=10. This approach means that only the last 10 turns of the conversation are kept in context and sent to the language model. Older messages are effectively dropped, preventing the context from growing indefinitely and significantly reducing token costs in long conversations.

Recap: Five Fixes for a Smaller Bill

Hanchett summarizes the five key strategies for optimizing AI agent token usage:

  • Cache the System Prompt: Stop resending the static prompt every turn.
  • Route by Difficulty: Use a cheaper model for simple calls.
  • Offload Big Tool Results: Keep 10k-token blobs out of context by summarizing them.
  • Cap Your Tool Loops: Implement max_iterations to prevent runaway calls.
  • Trim the History: Use a window or summarize old turns to manage context length.

By implementing these practical measures, developers can ensure their AI agents are not only effective but also cost-efficient, making them more sustainable for production environments.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.