AI Agents Need Better Memories

Large language models can now reason through complex tasks, but only if they have the correct context. The real bottleneck for AI agents is grounding them in relevant information, a challenge Databricks researchers are tackling with the concept of memory scaling. This approach posits that agent performance improves not just with bigger models, but with access to more relevant past data.

Memory scaling refers to an agent's ability to perform better as its external memory grows. This includes past conversations, user feedback, and interaction histories. Unlike parametric scaling (bigger models) or inference scaling (faster processing), memory scaling addresses knowledge gaps that model size alone cannot close.

The benefits extend beyond accuracy. Agents with better memory can reduce redundant exploration and resolve queries faster by recalling relevant schemas or successful past actions. This leads to gains in both accuracy and efficiency.

Memory Scaling vs. Other Approaches

Memory scaling offers a distinct advantage over continual learning, which typically updates model parameters. Continual learning is computationally expensive and brittle for multi-user environments. Memory scaling, by freezing LLM weights and expanding shared external state, allows workflows learned by one user to be immediately applied to another without retraining.

While large context windows can provide more information, they are not a substitute for memory. Packing vast amounts of raw data increases latency, cost, and can degrade reasoning quality as irrelevant tokens compete for attention. Memory scaling relies on selective retrieval, surfacing only high-signal information relevant to the current task.

Related startups

Types of Memory

Two key distinctions define memory types: episodic and semantic. Episodic memories are raw interaction records, while semantic memories are distilled generalizations. Each requires different storage and retrieval strategies.

Memory also differs by scope: personal versus organizational. Personal memories cater to individual user preferences, while organizational memories capture shared knowledge like naming conventions or business rules. The system must manage retrieval and updates appropriately, respecting permissions.

Experiments in Memory Scaling

Databricks' MemAlign framework stores past interactions as episodic memories, distills them into semantic memories using an LLM, and retrieves relevant entries at inference. They tested MemAlign on Databricks Genie Spaces, a natural-language interface for data queries.

When scaling with labeled data across 10 Genie Spaces, accuracy rose from near zero to 70%, surpassing expert-curated baselines. Reasoning steps dropped from about 20 to 5, indicating faster retrieval and less exploration.

Scaling with unlabeled user logs from a live Genie Space showed similar gains. Agent performance jumped from 2.5% to over 50% after ingesting just 62 log records, outperforming the expert-curated baseline. Reasoning steps stabilized around 4.3.

This demonstrates that even noisy, real-world interactions, filtered for helpfulness, can substitute for costly hand-engineered instructions, enabling agents to improve continuously from normal usage.

Organizational Knowledge Stores

Databricks also explored pre-computing organizational knowledge into structured memory stores. This includes table schemas, dashboard queries, and business glossaries.

When evaluated on internal benchmarks, adding this knowledge store improved accuracy by approximately 10%. These gains were most pronounced for questions requiring vocabulary bridging, table joins, and column-level knowledge.

Infrastructure for Memory Scaling

Supporting memory scaling in production requires robust infrastructure. Key challenges include scalable storage, memory management, and governance.

Simple file systems are inadequate for large-scale, multi-user memory. Dedicated data stores, particularly modern PostgreSQL-based systems supporting vector and full-text search, offer a more unified solution. Serverless variants that separate storage and compute are ideal.

Memory management is crucial. Bootstrapping new agents with existing enterprise assets like wikis and documentation provides an initial memory base to overcome cold-start problems.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.