In a conversation with Latent Space, Barak Lenz, CTO of AI21, revealed their latest innovation: the Jamba 3B model. This tiny yet powerful hybrid transformer-state space model aims to bring long-context AI capabilities to edge devices. "Images quickly become 'long context' problems," Lenz stated, explaining how even a few images can require thousands of tokens, making hybrid architectures vital for on-device AI.
Lenz spoke with Swyx at Latent Space about the journey of AI21, becoming pioneers in hybrid architectures by combining attention mechanisms with Mamba's state space models. This approach achieves unprecedented efficiency without sacrificing performance. Lenz highlighted, "Deep learning is super cool and super useful, but it's not enough. We wanted to bridge classical AI with new AI."
A core insight from the conversation is the importance of hybrid models for the future of long-context AI. The Jamba 3B model, with its 1:8 ratio of attention to Mamba layers, emerged from extensive ablations, proving the essential role of hybrid architectures in handling long contexts efficiently. As Lenz emphasized, "We were training models as early as 2018," indicating AI21's long-standing commitment to pushing the boundaries of AI.
Another key point is the need for enterprises to adopt model-agnostic orchestration layers. Lenz introduced Maestro, AI21's enterprise AI system, which treats models as "actions" with statistical properties rather than monolithic solutions. This enables continuous learning and adaptation without being locked into any single model provider. According to Lenz, images quickly become "long context" problems, with just 4 images requiring thousands of tokens.
Related Reading
- Tiny AI Model Outperforms Giants Redefining Scaling Laws
- OpenAI's gpt-oss: Open Models for Custom AI Solutions
- OpenAI's Sora, ImageGen, and Codex Reimagine Creative Production
The conversation also touched on the challenges of training at scale, with Lenz drawing from his background in algorithmic trading. He candidly discussed the persistence of optimization issues even with modern architectures and why good engineering is just as crucial as algorithmic innovation. "From the top we always wanted to do both something very scientific, very sound, but also have an application side," he noted, highlighting the company's commitment to both theoretical rigor and practical applications.
Lenz also emphasized the importance of a robust infrastructure for training and deploying AI models. He explained that while Mamba offers compelling advantages, the industry's current infrastructure is largely geared towards attention models, which presents a challenge for wider adoption. He noted the need for "thousands of GPUs to train" frontier models, underscoring the significant investment required for AI research.

