An overview of the Cognitive Memory Agent's layered memory structure.· LinkedIn Engineering

LinkedIn's AI Memory Platform

LinkedIn's Cognitive Memory Agent (CMA) provides AI agents with context and memory for personalized, adaptive experiences, starting with its Hiring Assistant.

May 22 at 1:01 AM9 min read

LinkedIn's ambition with generative AI extends beyond just powerful models. To deliver truly adaptive and personalized experiences, especially for tools like its Hiring Assistant, the company recognized the need for AI agents to possess a robust memory. This led to the development of the Cognitive Memory Agent (CMA), a foundational platform designed to build stateful, context-aware AI agents at scale.

Visual TL;DR. Need for AI Memory leads to Cognitive Memory Agent (CMA). Cognitive Memory Agent (CMA) uses Multiple Memory Layers. Cognitive Memory Agent (CMA) includes Ingestion & Retrieval. Cognitive Memory Agent (CMA) features Intelligent Context Management. Multiple Memory Layers enables Adaptive Experiences. Ingestion & Retrieval optimizes Adaptive Experiences. Intelligent Context Management enables Adaptive Experiences. Adaptive Experiences powers Personalized Hiring Assistant.

Need for AI Memory: AI agents need context and memory for personalized experiences
Cognitive Memory Agent (CMA): LinkedIn's platform for stateful, context-aware AI agents at scale
Multiple Memory Layers: Different knowledge depths for sophisticated personalization and understanding
Ingestion & Retrieval: Optimizing for performance and privacy in accessing memory
Intelligent Context Management: CMA intelligently manages context, unlike traditional memory systems
Adaptive Experiences: Enables AI agents to learn and improve over time
Personalized Hiring Assistant: Starting application of CMA for enhanced user interactions

Visual TL;DRQuickExplainDeeper

Unlike traditional memory systems that require explicit user input, CMA intelligently manages context. It leverages multiple memory stores, each offering different knowledge depths, to enable sophisticated personalization. This approach is key to building AI agents that learn and improve over time, moving beyond the limitations of a simple context window.

The CMA Architecture: Layers of Intelligence

At its core, CMA is built upon three primary components: distinct memory layers, an ingestion process, and a sophisticated retrieval orchestration layer. This structure allows application agents to maintain continuity across interactions, learn dynamically, and compose tool usage effectively.

The memory layers encompass conversational, episodic, semantic, and procedural memory. Each layer is exposed through tool abstractions, providing agents with a versatile toolkit for accessing information.

An ingestion layer processes unstructured inputs, extracts relevant information, and determines the optimal storage method. This ensures data is prepared for efficient retrieval.

The retrieval orchestration layer is where the magic happens. It infers user intent from natural language, dynamically fetches relevant memories across all layers, and synthesizes coherent responses. This goes beyond basic embedding retrieval, incorporating reasoning and planning for higher quality, contextually relevant outputs.

Memory Layers: Building a Richer Understanding

CMA differentiates itself by maintaining multiple types of memory, each tailored for specific needs and offering distinct latency characteristics.

Conversational memory captures the immediate state of an ongoing dialogue. It stores and indexes prior turns, enabling future interactions to incorporate relevant history without exceeding context limits. This is achieved through a combination of chronological logs and semantic indexes, with periodic summarization for context compression.

Beyond immediate conversations lies long-term memory, which allows agents to accumulate durable knowledge about users and their environments across sessions. This layer has evolved significantly from earlier, more rudimentary key-value stores.

Long-term memory is further segmented into three sub-categories, mirroring cognitive models:

Episodic memory records specific past events and interactions. It's timestamped and contextual, enabling agents to reference similar activities within a given timeframe. This builds situational awareness and refines agent behavior based on past signals, such as a recruiter archiving a candidate lacking specific skills.
Semantic memory aggregates preferences and generalized knowledge derived from repeated interactions. This layer abstracts specific events into broader patterns, like a company's policy on visa sponsorship or remote hiring, which can inform future actions like drafting job descriptions.
Procedural memory influences the execution strategy by identifying user-specific workflows and steps. It captures implicit preferences in how a user accomplishes tasks, such as a recruiter's preferred candidate filtering sequence or outreach template usage.

Together, these memory types enable agents to understand user workflows, past events, and enduring environmental facts, leading to a high degree of adaptation and personalization. As an agent is used more, it becomes "smarter," aligning with user behavior without constant explicit reminders.

Ingestion and Retrieval: Optimizing for Performance and Privacy

To ensure optimal retrieval latency, data processing is largely offloaded to the ingestion phase. This involves using LLMs to summarize patterns, extract episodic activities, and compress conversational memory, all while adhering to strict privacy-preserving techniques.

The system employs both streaming and batch processing for asynchronous indexing. Streaming handles latency-sensitive tasks like conversational summarization, while batch processing manages computation-intensive tasks such as extracting semantic memory nodes.

LinkedIn's approach to hierarchical semantic memory indexing, using LLM calls to convert activity data into Q&A pairs and summaries, offers advantages over flatter methods. This structure enhances efficiency by reducing LLM calls and optimizing retrieval, while its tree-like design facilitates parallel processing for scalability.

Retrieval in CMA is not a static search. It's a dynamic reasoning process orchestrated by a lightweight agent. This orchestrator intelligently plans how to access and combine information from different memory layers, determining the optimal order and reconciliation strategy across stores. This adaptive retrieval is crucial for handling the layered, heterogeneous, and evolving nature of CMA's memory stores.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.

#Cognitive Memory Agent #AI Agents #Generative AI #Machine Learning #LinkedIn Engineering #AI Memory #Personalization