A recent report from OpenRouter and Andreessen Horowitz (a16z) has shed light on the burgeoning market for open-weight LLMs. The report categorizes models by parameter count: small (under 15 billion), medium (15-70 billion), and large (over 70 billion). While the data highlights a clear market-model fit for medium-sized models, leading to a surge in their usage, it also notes a relative decline in the share of small models. This narrative, however, demands a closer look, particularly concerning the reality of small AI models local deployment.
Beyond the API: The Invisible Ecosystem
The core issue lies in the report's methodology. The OpenRouter dataset, by its nature, captures only models accessed via managed API services. This overlooks a massive, rapidly growing segment of the AI landscape: small models optimized for efficient execution on consumer hardware. Think modern CPUs and readily available GPUs. Tools like llama.cpp, llamafile, ollama, and LM Studio are enabling a decentralized ecosystem where these smaller, more accessible models are deployed locally. This critical aspect of small AI models local deployment remains largely invisible to API-centric analyses.
