The stark reality that "95% of AI pilots inside the enterprise fail" has been a significant hurdle in the widespread adoption of artificial intelligence. This formidable statistic, highlighted in a recent MIT report, underscores the critical challenges businesses face in moving generative AI from experimental phases to robust, scalable production environments. It is precisely this dilemma that Amazon Web Services (AWS) aims to address with its latest enhancements to Amazon Bedrock AgentCore, unveiled by Matthew Berman during the re:Invent conference. These announcements introduce three pivotal capabilities—Policy, Evaluations, and Episodic Memory—designed to instill trust, control, and continuous improvement in enterprise AI agents.
AgentCore, positioned as "the most advanced agentic platform" by AWS, offers a comprehensive suite of services for securely building and deploying highly capable agents at scale. Matthew Berman emphasizes that these new features are not mere add-ons but are deeply integrated into the agentic framework, distinguishing AgentCore from other market offerings. The core insights driving these updates revolve around solving the dual challenges of how to trust AI and how to control AI, moving beyond theoretical concerns to practical, production-grade solutions.
The first major enhancement, Policy in Amazon Bedrock AgentCore, provides deterministic, real-time enforcement to ensure agents operate within defined boundaries. This feature allows organizations to establish explicit guardrails around agent behavior, controlling what agents can access, what actions they perform, and under what conditions. Policies can be crafted using natural language, which AgentCore automatically converts into Cedar, a formal language for authorization. This simplifies policy creation and auditing without requiring custom code, making it accessible for broader enterprise adoption. Crucially, the system is designed to process "thousands of requests per second while maintaining operational speed agents need to act," ensuring that robust governance does not impede performance. Furthermore, Policy in AgentCore is built on years of automated reasoning, offering a "verifiably correct" method to test even non-deterministic AI systems, a significant stride towards ensuring predictable and compliant agent operations.
Next, AgentCore introduces comprehensive Evaluations, addressing the vital need for measuring and improving agent performance. Berman notes that evaluations often come secondary in AI development, but they are foundational. "You don't know if you're improving unless you're able to measure it," he asserts, highlighting the importance of establishing baselines and tracking progress. AgentCore Evaluations enable continuous assessment of agent responses across various quality metrics, including correctness, helpfulness, faithfulness, response relevance, coherence, and instruction following. It also incorporates safety metrics like harmfulness and stereotyping. This robust evaluation framework allows for on-demand or continuous monitoring, providing deep observability into agent operations and enabling organizations to trace back an agent’s initial decisions if issues arise, thereby fostering a culture of accountability and iterative enhancement.
Related Reading
- Enterprise AI Spend Diverges: Agents Lag, Model Access Surges
- Enterprise AI Shifts: Agents Face Friction as API Adoption Fuels New Leaders
- OpenAI's Future Hinges on Enterprise Adoption and Sustained Funding
The third significant upgrade is Episodic Functionality for AgentCore Memory. This feature allows agents to learn from both successes and failures across multiple interactions, building knowledge over time rather than being confined to single conversational threads. Agents can adapt solutions based on patterns observed in similar situations, making them more effective and intelligent over prolonged use. Unlike traditional memory systems tied to specific users or conversations, AgentCore’s episodic memory propagates throughout the entire agent implementation. This enables a more profound and persistent learning capability, allowing agents to evolve and refine their task completion strategies based on a cumulative history of experiences.
What makes these advancements particularly impactful is their integrated nature. Matthew Berman underscores this point by stating, "The key is that all of these features, all of this functionality, is moved into the execution path. It is not an afterthought. This is built in to how you build agents now." This native integration of policy enforcement, performance evaluation, and advanced memory directly into the AgentCore experience means enterprises can deploy AI agents with a greater degree of confidence and control from the outset. AWS is effectively solving the enterprise rollout dilemma by providing a production-grade framework that directly tackles the critical issues of trust, control, and continuous improvement, paving the way for more reliable and impactful AI agent deployments.



