Researchers have unveiled Mamba-3, a significant evolution in State Space Models (SSM) that shifts the optimization focus squarely onto inference efficiency. This marks a departure from its predecessor, Mamba-2, which prioritized training speed. The latest iteration aims to tackle the growing demand for faster LLM deployment and agentic workflows.
Developed through a collaboration between Carnegie Mellon University, Princeton University, Cartesia AI, and Together AI, Mamba-3 introduces a more expressive recurrence formula, complex-valued state tracking, and a multi-input, multi-output (MIMO) variant. These enhancements reportedly boost accuracy without compromising decoding speed.
At the 1.5 billion parameter scale, Mamba-3's single-input, single-output (SISO) version outperforms Mamba-2, Gated DeltaNet, and even Llama-3.2-1B (a Transformer) in prefill and decode latency across various sequence lengths. The team has also open-sourced the underlying kernels, built with Triton, TileLang, and CuTe for optimal hardware performance.
