Soheil Feizi on Continual Learning for AI Agents

Soheil Feizi of RELAI explains the challenges and principles behind continual learning for AI agents, focusing on replayable, holistic, lifelong, and efficient improvements.

9 min read
Soheil Feizi presenting Continual Learning for AI Agents
AI Engineer

In the pursuit of more robust and adaptable AI agents, the concept of continual learning is paramount. Soheil Feizi, Founder & Chief Scientist at RELAI and Associate Professor in Computer Science at the University of Maryland, recently delved into this critical area. His presentation, "Continual Learning for AI Agents: From Failures to Durable Improvements," outlined the challenges and principles behind building AI agents that can learn and improve over time without regressions.

Soheil Feizi on Continual Learning for AI Agents - AI Engineer
Soheil Feizi on Continual Learning for AI Agents — from AI Engineer

Visual TL;DR. AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers demonstrated in Benchmark: Meridian. Three Improvement Layers leads to Durable Improvements. Human Learning Parallel inspired by AI Agent Learning.

Related startups

  1. AI Agent Learning: agents learn and improve from experiences without forgetting
  2. Forgetting Problem: agents forget past knowledge when learning new tasks
  3. Replayable Environments: environments that allow agents to revisit past experiences
  4. Three Improvement Layers: holistic, lifelong, and efficient agent improvements
  5. Benchmark: Meridian: a practical application of continual learning principles
  6. Durable Improvements: AI agents that learn without regressions
  7. Human Learning Parallel: emulating human interaction and feedback cycles
Visual TL;DR
Visual TL;DR, startuphub.ai AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers leads to Durable Improvements faces solved by enables leads to AI Agent Learning Forgetting Problem Replayable Environments Three Improvement Layers Durable Improvements From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers leads to Durable Improvements faces solved by enables leads to AI Agent Learning ForgettingProblem ReplayableEnvironments Three ImprovementLayers DurableImprovements From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers leads to Durable Improvements faces solved by enables leads to AI Agent Learning agents learn and improve from experienceswithout forgetting Forgetting Problem agents forget past knowledge when learningnew tasks Replayable Environments environments that allow agents to revisitpast experiences Three Improvement Layers holistic, lifelong, and efficient agentimprovements Durable Improvements AI agents that learn without regressions From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers leads to Durable Improvements faces solved by enables leads to AI Agent Learning agents learn andimprove fromexperiences without… ForgettingProblem agents forget pastknowledge whenlearning new tasks ReplayableEnvironments environments thatallow agents torevisit past… Three ImprovementLayers holistic, lifelong,and efficient agentimprovements DurableImprovements AI agents thatlearn withoutregressions From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers demonstrated in Benchmark: Meridian. Three Improvement Layers leads to Durable Improvements. Human Learning Parallel inspired by AI Agent Learning faces solved by enables demonstrated in leads to inspired by AI Agent Learning agents learn and improve from experienceswithout forgetting Forgetting Problem agents forget past knowledge when learningnew tasks Replayable Environments environments that allow agents to revisitpast experiences Three Improvement Layers holistic, lifelong, and efficient agentimprovements Benchmark: Meridian a practical application of continuallearning principles Durable Improvements AI agents that learn without regressions Human Learning Parallel emulating human interaction and feedbackcycles From startuphub.ai · The publishers behind this format
Visual TL;DR, startuphub.ai AI Agent Learning faces Forgetting Problem. Forgetting Problem solved by Replayable Environments. Replayable Environments enables Three Improvement Layers. Three Improvement Layers demonstrated in Benchmark: Meridian. Three Improvement Layers leads to Durable Improvements. Human Learning Parallel inspired by AI Agent Learning faces solved by enables demonstrated in leads to inspired by AI Agent Learning agents learn andimprove fromexperiences without… ForgettingProblem agents forget pastknowledge whenlearning new tasks ReplayableEnvironments environments thatallow agents torevisit past… Three ImprovementLayers holistic, lifelong,and efficient agentimprovements Benchmark:Meridian a practicalapplication ofcontinual learning… DurableImprovements AI agents thatlearn withoutregressions Human LearningParallel emulating humaninteraction andfeedback cycles From startuphub.ai · The publishers behind this format

The Core of Continual Learning for AI Agents

Feizi began by drawing a parallel between human learning and the desired capabilities of AI agents. Humans learn from experience by interacting with the world and receiving feedback, a cycle that AI agents should ideally emulate. The goal of continual learning for AI agents is to enable them to continuously improve from their experiences without forgetting what they have already learned.

He identified two fundamental challenges in achieving this: first, how to effectively get feedback on an agent's performance, and second, how to act upon that feedback to optimize the agent. In production environments, raw logs are not enough; they need to be transformed into actionable feedback. This can be achieved either through automated analysis by LLMs or code, or through critical human feedback from domain experts.

The Need for Replayable Learning Environments

A significant hurdle is that mere logs and feedback, while informative, are not inherently testable. Feizi emphasized the need for a replayable learning environment. This environment acts as a simulation that can be rerun with defined grading on what constitutes success. Such an environment would consist of:

  • Observed trace + feedback: Recreating what happened during a specific interaction.
  • Mocked/real tools: Simulating the tools the agent calls.
  • Synthetic user: Replaying the interaction with a simulated user.
  • Evaluators: Defining success metrics to score the agent's performance.

The output of this process is an executable simulation against which candidate agents can be tested, ensuring that fixes are only kept if they pass the evaluation.

Three Layers of Agent Improvement

Feizi detailed three key layers where agents can be improved through continual learning:

  1. Model: This involves updating the weights of the LLM or other underlying models. Methods like Supervised Fine-Tuning (SFT), Reinforcement Learning (RL) post-training (e.g., DPO, GRPO, RLVR), and Low-Rank Adaptation (LoRA) fall into this category. These are typically the most expensive methods.
  2. Harness: This layer focuses on modifying the prompts, skills, and code surrounding the model. Techniques like "Trace-to-harness" (where a coding agent rewrites prompts or tools based on logs and feedback) and "GEPA & prompt search" (evolutionary optimization of the harness) are examples. These offer more flexibility and are generally less costly than model updates.
  3. Memory: This layer involves writing down facts and distilling skills so the agent doesn't have to rediscover them. Methods include "Information memory" (storing facts or corrections) and "Skill distillation" (compressing successful trajectories into reusable skill packets). These are generally the cheapest and fastest, though often unverified.

Feizi stressed that a good learning engine should aim for the smallest durable change at the right level to be most effective.

Principles of Practical Verifiable Continual Learning

He then summarized the four core principles of a practical Verifiable Continual Learning (VCL) approach:

  1. Replayable: Turn logs and feedback into testable learning environments. This addresses issues with feedback quality.
  2. Holistic: Route each fix to the appropriate layer, model, harness, or memory, to address the root cause of a failure. This fixes the routing of solutions.
  3. Lifelong: Improve the agent continuously without regressing on previously learned behaviors or capabilities. This checks and avoids regression.
  4. Efficient: Pick the smallest fix that works, allowing the learning loop to run continuously and cost-effectively. This ensures practicality.

Feizi also highlighted the RELAI CLI as a tool to add VCL to agents in just two commands, simplifying the process of initializing an agent, creating learning environments from logs and feedback, and optimizing the agent using the VCL principles.

A Benchmark in Practice: The Meridian Support Agent

To illustrate these concepts, Feizi presented a case study of a "Meridian Support Agent." This agent, designed as a tool-using support agent, was tested in a benchmark environment with:

  • A single source of truth: Fixed company policies and databases defined correct actions.
  • Interacting policies: Rules for refunds, escalations, and disclosures that could constrain each other.

The agent's initial performance showed weaknesses in required escalation (failing to route unauthorized refunds) and latency budget (too many turns/tool calls under pressure), despite holding up well on forbidden direct refunds and safety disclosure. By using the VCL principles and the RELAI tools, the agent was optimized, resulting in a 10% average improvement across environments and an increase in average score from 87% to 97%. The key change involved canonicalizing refund-escalation arguments and implementing regression controls within the optimization loop itself.

Ultimately, Feizi's presentation underscored the importance of moving beyond simple fine-tuning to a more rigorous, verifiable, and efficient approach to continual learning for AI agents, ensuring they not only improve but do so reliably and sustainably.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.