Ben Kus, CTO of Box, illuminated the arduous yet ultimately transformative journey of integrating artificial intelligence into their enterprise content platform, revealing that the true power of AI in complex business environments lies not in a single, monolithic large language model, but in an orchestrated "agentic" architecture. Kus detailed the critical shift from initial, often brittle, LLM-based solutions to a robust, multi-step agentic framework designed to tackle the formidable challenge of unstructured enterprise data.
Kus shared that Box's initial foray into AI in 2023 involved deploying LLMs for tasks like metadata extraction, summarization, and Q&A. While promising for simple use cases, these pure LLM approaches quickly hit a wall when confronted with the vast, often messy, and highly variable unstructured data prevalent across enterprise documents. The core problem was reliability. LLMs, despite their intelligence, proved prone to "hallucinations, inconsistency, and fragility," making them unsuitable for mission-critical data extraction where accuracy is paramount. Debugging these black-box systems was notoriously difficult, leading to what Kus described as "despair" within the engineering teams.
This frustration catalyzed a fundamental architectural pivot. Box recognized that an LLM is a powerful "brain," but it requires a sophisticated operational framework to be truly effective. The solution was an agentic architecture, where the LLM acts as the central reasoning engine, but is equipped with a suite of tools—ranging from traditional regex patterns to external APIs—and operates within a structured "Plan, Act, Observe, Reflect" loop. This framework allows the agent to intelligently assess a task, select the appropriate tools, execute steps, validate outcomes, and self-correct, much like a human expert.
For instance, extracting an invoice number no longer relies solely on an LLM guessing the correct string. Instead, the agent first identifies the document type, then employs a precise regex tool for structured fields, only resorting to the LLM for more ambiguous or unstructured text. This hybrid approach significantly boosts accuracy and reduces the common pitfalls of pure generative models. It provides a level of observability that was previously impossible, allowing engineers to trace the agent's decision-making process and pinpoint errors.
A crucial takeaway from Box's experience, articulated emphatically by Kus, is the imperative to adopt an agentic architecture from the outset for any serious AI application. "If you're building any kind of serious AI application, you should be building an agentic architecture from day one," he advised. Attempting to retrofit such a system onto a pure LLM foundation is far more complex and costly than designing it in from the ground up. This proactive approach ensures scalability, reliability, and cost-effectiveness by judiciously deploying LLM compute only when necessary, rather than for every single step. Box’s journey underscores that for enterprise AI, the future is not just about smarter models, but smarter systems that orchestrate those models with precision and purpose.

