Preferred on Google

Soheil Feizi on Continual Learning for AI Agents

Soheil Feizi of RELAI explains the challenges and principles behind continual learning for AI agents, focusing on replayable, holistic, lifelong, and efficient improvements.

Jul 5 at 4:04 AM9 min read

Soheil Feizi presenting Continual Learning for AI Agents — AI Engineer

In the pursuit of more robust and adaptable AI agents, the concept of continual learning is paramount. Soheil Feizi, Founder & Chief Scientist at RELAI and Associate Professor in Computer Science at the University of Maryland, recently delved into this critical area. His presentation, "Continual Learning for AI Agents: From Failures to Durable Improvements," outlined the challenges and principles behind building AI agents that can learn and improve over time without regressions.

Soheil Feizi on Continual Learning for AI Agents - AI Engineer — Soheil Feizi on Continual Learning for AI Agents — from AI Engineer

Visual TL;DR. AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers demonstrated in Benchmark: Meridian. Three Improvement Layers leads to Durable Improvements. Human Learning Parallel inspired by AI Agent Learning.

Related startups

AI Agent Learning: agents learn and improve from experiences without forgetting
Forgetting Problem: agents forget past knowledge when learning new tasks
Replayable Environments: environments that allow agents to revisit past experiences
Three Improvement Layers: holistic, lifelong, and efficient agent improvements
Benchmark: Meridian: a practical application of continual learning principles
Durable Improvements: AI agents that learn without regressions
Human Learning Parallel: emulating human interaction and feedback cycles

Visual TL;DRQuickExplainDeeper

The Core of Continual Learning for AI Agents

Feizi began by drawing a parallel between human learning and the desired capabilities of AI agents. Humans learn from experience by interacting with the world and receiving feedback, a cycle that AI agents should ideally emulate. The goal of continual learning for AI agents is to enable them to continuously improve from their experiences without forgetting what they have already learned.

He identified two fundamental challenges in achieving this: first, how to effectively get feedback on an agent's performance, and second, how to act upon that feedback to optimize the agent. In production environments, raw logs are not enough; they need to be transformed into actionable feedback. This can be achieved either through automated analysis by LLMs or code, or through critical human feedback from domain experts.

The Need for Replayable Learning Environments

A significant hurdle is that mere logs and feedback, while informative, are not inherently testable. Feizi emphasized the need for a replayable learning environment. This environment acts as a simulation that can be rerun with defined grading on what constitutes success. Such an environment would consist of:

Observed trace + feedback: Recreating what happened during a specific interaction.
Mocked/real tools: Simulating the tools the agent calls.
Synthetic user: Replaying the interaction with a simulated user.
Evaluators: Defining success metrics to score the agent's performance.

The output of this process is an executable simulation against which candidate agents can be tested, ensuring that fixes are only kept if they pass the evaluation.

Three Layers of Agent Improvement

Feizi detailed three key layers where agents can be improved through continual learning:

Model: This involves updating the weights of the LLM or other underlying models. Methods like Supervised Fine-Tuning (SFT), Reinforcement Learning (RL) post-training (e.g., DPO, GRPO, RLVR), and Low-Rank Adaptation (LoRA) fall into this category. These are typically the most expensive methods.
Harness: This layer focuses on modifying the prompts, skills, and code surrounding the model. Techniques like "Trace-to-harness" (where a coding agent rewrites prompts or tools based on logs and feedback) and "GEPA & prompt search" (evolutionary optimization of the harness) are examples. These offer more flexibility and are generally less costly than model updates.
Memory: This layer involves writing down facts and distilling skills so the agent doesn't have to rediscover them. Methods include "Information memory" (storing facts or corrections) and "Skill distillation" (compressing successful trajectories into reusable skill packets). These are generally the cheapest and fastest, though often unverified.

Feizi stressed that a good learning engine should aim for the smallest durable change at the right level to be most effective.

Principles of Practical Verifiable Continual Learning

He then summarized the four core principles of a practical Verifiable Continual Learning (VCL) approach:

Replayable: Turn logs and feedback into testable learning environments. This addresses issues with feedback quality.
Holistic: Route each fix to the appropriate layer, model, harness, or memory, to address the root cause of a failure. This fixes the routing of solutions.
Lifelong: Improve the agent continuously without regressing on previously learned behaviors or capabilities. This checks and avoids regression.
Efficient: Pick the smallest fix that works, allowing the learning loop to run continuously and cost-effectively. This ensures practicality.

Feizi also highlighted the RELAI CLI as a tool to add VCL to agents in just two commands, simplifying the process of initializing an agent, creating learning environments from logs and feedback, and optimizing the agent using the VCL principles.

A Benchmark in Practice: The Meridian Support Agent

To illustrate these concepts, Feizi presented a case study of a "Meridian Support Agent." This agent, designed as a tool-using support agent, was tested in a benchmark environment with:

A single source of truth: Fixed company policies and databases defined correct actions.
Interacting policies: Rules for refunds, escalations, and disclosures that could constrain each other.

The agent's initial performance showed weaknesses in required escalation (failing to route unauthorized refunds) and latency budget (too many turns/tool calls under pressure), despite holding up well on forbidden direct refunds and safety disclosure. By using the VCL principles and the RELAI tools, the agent was optimized, resulting in a 10% average improvement across environments and an increase in average score from 87% to 97%. The key change involved canonicalizing refund-escalation arguments and implementing regression controls within the optimization loop itself.

Ultimately, Feizi's presentation underscored the importance of moving beyond simple fine-tuning to a more rigorous, verifiable, and efficient approach to continual learning for AI agents, ensuring they not only improve but do so reliably and sustainably.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#AI Research #Soheil Feizi #RELAI #Continual Learning #AI Agents #Machine Learning #Reinforcement Learning #LLMs #Software Engineering

AI Daily Digest

Get the most important AI news daily.

+40k readers