Martin Keen, a Master Inventor at IBM, breaks down two fundamental approaches to how AI models access and remember information: Long Context and Cache Augmented Generation (CAG). In this insightful video, Keen illustrates the distinct mechanisms and trade-offs of each method, offering a clear understanding of how AI models can effectively process and recall information from extended data sources.
Related startups
Understanding Long Context and CAG
Keen begins by explaining that LLMs inherently rely on their training data. However, to utilize external knowledge, they employ two main strategies. The first, Long Context, involves feeding the model a large amount of information directly within its input prompt. The second, Cache Augmented Generation (CAG), involves a more sophisticated process where relevant information is retrieved and then provided to the model.
The "Lost in the Middle" Problem with Long Context
Keen highlights a significant challenge with the long context approach: the "lost in the middle" phenomenon. He explains that when an LLM processes a very long context window, its ability to accurately recall information from the middle of that context can degrade. The model tends to remember information presented at the beginning and end of the prompt more effectively than information buried in the middle. This is visualized on a graph where context size increases over time, showing a dip in recall accuracy for the middle sections of very large contexts.
