Jeff Crume, PhD, a Distinguished Engineer at IBM, offers a precise and illuminating dissection of AI agents in his presentation, "Anatomy of AI Agents: Inside LLMs, RAG Systems, & Generative AI." Crume's core thesis centers on breaking down these intelligent systems into three fundamental, interconnected components: sensing, thinking, and acting. He illustrates how data from the real world is absorbed, processed into decisions, and subsequently translated into tangible actions, all while continuously learning and adapting.
The journey of an AI agent begins with "sensing," its mechanism for perceiving the external environment. Crume explains that this perception can manifest in various forms. For a chatbot, it might be textual input, processed through natural language processing. For more complex systems like autonomous vehicles, it involves integrating data from a myriad of sensors, such as cameras and microphones. Additionally, agents can receive information through APIs and triggered events, acting as digital eyes and ears to gather necessary data for their operations.
Once information is sensed, it moves into the "thinking" phase, the cognitive core of the AI agent. This stage is significantly enhanced by incorporating external knowledge and predefined policies. Crume highlights the necessity of a "knowledge base" where the agent can access stored "facts, rules, and context," drawing from sources like databases or Retrieval Augmented Generation (RAG) systems. This external grounding prevents the agent from operating solely on its pre-trained data, offering up-to-date and domain-specific information.
