We’re approaching a new phase in the AI ecosystem that’s not without uncertainty and discourse. But over the next several months, what is for certain is the introduction of Agentic AI workflows that can actually deliver the hyperbolic proclamations for which pundits have been bending over backwards.
Indeed, the Agentic AI ecosystem is slowly taking shape, paving a smoother road to enterprise adoption. Frameworks and developer tools, like LangChain, are gaining momentum and acceptance, allowing developers to build workflows where Large Language Models (LLMs) and fine-tuned LLMs can interact with each other and with APIs/services to execute a task. This phenomenon of building Agents will become the focal point of the industry. In the words of LangChain’s CEO Harrison, it’s like “running LLMs in a for-loop, and asking the LLM to reason and plan what the next best step is to achieve the task at hand.” But running LLMs at scale, as required in a complex workflow, isn’t clear cut yet.
We’re definitely far beyond the debut of ChatGPT in November 2022, but we’ve arrived at a strange phase of limbo. There’s only marginal gains for new foundational models, and the critical components in these would-be workflows aren’t feasible to ingest and process the steps of an Agentic AI workflow, like reflection, or planning. Not to mention multi-agent collaboration. Why? The Transformer architecture doesn’t quite fit the bill, and the new growing demands are straining them. Context windows and costs are the main points of contention.
Among the LLM leaderboards, the dust is beginning to settle, hinting at which architecture might lead us towards Agentic AI.
Or Dagan, VP of Foundational models at AI21 Labs, has been working on this pioneering technology since 2018, having led the development of Wordtune and their latest family of foundational models that’s receiving critical reception for veering the industry to a new direction. “Transformers have been pivotal in advancing Natural Language Processing, yet their reliance on extensive memory, quadratic processing, and demanding compute resources pose significant hurdles,” explained Dagan. “These challenges make it costly and impractical to scale Transformers for tasks involving lengthy documents or vast datasets.”
Long Context and Accuracy Over Multimodality
If there’s ever been a differentiating moment in the industry, it’s simply not the multimodality, yet. New releases from OpenAI’s GPT-4o and even Apple’s 4M AI model are beyond impressive, pushing the envelope to capabilities of creating and editing images and videos, and even 3D assets creation with Meta’s 3D TextureGen. But it’s still the basics that matter. For the Agentic AI realm, enterprise adoption is hinged on text, data analysis and coding tasks, combined with accurate outputs from long inputs.
The current problem with long context prompts is that the models aren’t quite able to generate a coherent response from internal data. The term is called hallucinations. Retrieval Augmented Generation (RAG) is touted as a solution to hallucinations. It excels in scenarios where keyword-based searches suffice, like retrieving factual information about well-defined topics. Yet, challenges arise in more complex tasks requiring abstract reasoning, where keyword searches may fail to pinpoint relevant documents or where models may ignore retrieved context in favor of internal knowledge. Moreover, implementing it is resource-intensive. RAG alone cannot entirely eliminate the issue of model hallucinations in AI systems. And newer model architectures need to better use RAG retrieved data.
