The pursuit of generalist AI agents has long been hampered by the need for extensive human oversight in agent design and adaptation. Traditional methods require bespoke agent architectures for each new task, creating a bottleneck in rapidly deploying AI capabilities. The introduction of Memento-Skills challenges this paradigm by presenting an agent-designing agent.
Autonomous Agent Construction via Skill Evolution
Memento-Skills operates as a generalist LLM agent system capable of autonomously constructing, adapting, and improving specialized agents. At its core is a memory-based reinforcement learning framework utilizing stateful prompts. Reusable skills, stored as structured markdown files, act as a persistent, evolving memory. This allows the Memento-Skills LLM agent to carry knowledge across diverse interactions, starting from basic functionalities like web search and terminal operations. The system leverages a 'Read--Write Reflective Learning' mechanism for continuous improvement. In the 'read' phase, a skill router selects the most pertinent skill based on the current stateful prompt. The 'write' phase involves updating and expanding the skill library based on new experiences. This closed-loop system facilitates continual learning without necessitating updates to the core LLM parameters, as all adaptation occurs through the externalized skill and prompt evolution.
End-to-End Agent Design and Performance Breakthroughs
A key differentiator of Memento-Skills is its ability to enable a generalist agent to design agents end-to-end for novel tasks, bypassing the reliance on human-designed agents. Through iterative skill generation and refinement, the Memento-Skills LLM agent progressively enhances its own capabilities. Experimental results on the General AI Assistants benchmark and Humanity's Last Exam underscore the system's efficacy, demonstrating substantial relative improvements of 26.2% and 116.2% in overall accuracy, respectively. This adaptive skill-based approach offers a compelling pathway towards more dynamic and efficient AI deployment.