The year 2025 confirmed that the future of artificial intelligence lies not just in raw model size, but in integration, efficiency, and accessibility. The "Mixture of Experts" year-end review, hosted by Tim Hwang, brought together key voices—Chris Hay, Gabe Goodhart, Kaoutar El Maghraoui, Aaron Baughman, and Abraham Daniels—to dissect the defining trends of the past year and forecast the trajectory toward 2026. The central narrative was the maturation of agents and the resulting pressure on the underlying hardware and software ecosystems. Tim Hwang spoke with veteran panelists at the "Mixture of Experts" year-end episode about the biggest moments in AI, including the maturation of AI agents, the rise of open source, and the structural constraints of hardware supply.
Chris Hay immediately defended his prior prediction that 2025 would be the "Year of the Super Agent," despite Hwang’s playful assertion that agents were "the dog that didn't bark." Hay argued that the foundational technology—advanced reasoning and expanded tool usage—did indeed arrive, albeit integrated into existing models like ChatGPT Deep Research and Claude Code. He emphasized that the agent of today is not a single, specialized function, but an orchestrator. Hay noted that current models are able to "think much longer and they're able to plan," allowing them to chain together multiple tools to achieve complex goals, such as generating an entire presentation from a single prompt. This shift moves beyond simple one-off tasks toward autonomous workflows, redefining what a functional AI agent truly is in the commercial space.
Gabe Goodhart focused on the breakthrough year for open source models, citing advancements like Kimi K2 Thinking, which have brought open performance parity closer to proprietary systems. This rapid convergence has shifted the primary friction point away from model quality and toward the surrounding infrastructure and user experience. The open source community excels at individual components but struggles with unified packaging. Integrating disparate tools and models into a coherent user-facing product remains complex. Goodhart suggested that achieving the same level of user experience and delight found in closed systems is the next major hurdle for open source to overcome before it can truly dominate across all domains.
The material reality of AI was a major theme, particularly concerning the scarcity of high-end AI accelerators. Kaoutar El Maghraoui highlighted that 2025 cemented the AI hardware scarcity as a "structural constraint," not a temporary bottleneck. This unprecedented demand, largely driven by massive frontier model training, has forced the industry to bifurcate its focus: scale-up (massive clusters like NVIDIA H200/B200, AWS Trainium) versus scale-out (efficient, smaller models running locally or at the edge). El Maghraoui predicted that 2026 will be defined by the tension between "frontier versus efficient model classes," noting the increasing feasibility of running models with 1 to 5 billion parameters locally through techniques like 4-bit quantization and specialized neural processing units (NPUs). This efficiency drive is essential because "the industry really must scale efficiency instead," rather than relying solely on continually increasing compute capacity that remains supply-constrained. This shift is driving new hybrid architectures, combining elements like Transformers and State Space Models (SSMs) to optimize for both performance and energy consumption.
This push for efficiency dovetails with the rising importance of modularity in multimodal systems, a key area for IBM’s Granite models. Aaron Baughman stressed the need for models that can interpret and act upon diverse data—language, vision, and action—to enable complex, autonomous digital workers. Abraham Daniels detailed the focus on building modular capabilities that leverage adapters and external orchestration layers. This allows enterprises to tailor AI solutions by combining specialized components, avoiding the overhead of monolithic, omni-capable models. This architecture, Daniels argued, allows developers to "call each capability as needed," thereby reducing footprint and increasing practical performance for specific enterprise use cases, such as complex document processing and Retrieval-Augmented Generation (RAG) pipelines.
The panelists agreed that the convergence of these trends sets the stage for a fierce battle over the "front door" of AI. Whether it is through browser integration (Perplexity, enhanced ChatGPT), mobile operating systems (Apple, Google/Android), or enterprise platforms, the entity that successfully integrates the reasoning, the tools, and the models—regardless of whether those models are open or closed—will capture the market. This fight for the control plane is evident across multiple digital surfaces, including browser extensions and embedded mobile applications. The ultimate success will hinge on who can provide the most seamless user experience, masking the underlying complexity of the multi-agent, hybrid-architecture systems now becoming the norm.



