A recent report from OpenRouter and Andreessen Horowitz (a16z) has shed light on the burgeoning market for open-weight LLMs. The report categorizes models by parameter count: small (under 15 billion), medium (15-70 billion), and large (over 70 billion). While the data highlights a clear market-model fit for medium-sized models, leading to a surge in their usage, it also notes a relative decline in the share of small models. This narrative, however, demands a closer look, particularly concerning the reality of small AI models local deployment.
Beyond the API: The Invisible Ecosystem
The core issue lies in the report's methodology. The OpenRouter dataset, by its nature, captures only models accessed via managed API services. This overlooks a massive, rapidly growing segment of the AI landscape: small models optimized for efficient execution on consumer hardware. Think modern CPUs and readily available GPUs. Tools like llama.cpp, llamafile, ollama, and LM Studio are enabling a decentralized ecosystem where these smaller, more accessible models are deployed locally. This critical aspect of small AI models local deployment remains largely invisible to API-centric analyses.
It's highly probable that these small AI models are flourishing in specialized, utility-focused applications. We're talking about on-device translation, targeted summarization, and localized data processing, often within private, self-contained environments. This aligns with the lower-cost, lower-usage quadrant often observed in broader AI cost-usage analyses, a segment that platforms like any-llm-platform are beginning to illuminate for decentralized usage.
The OpenRouter AI report, while valuable for API trends, misses the quiet revolution happening on laptops and local servers worldwide, a trend that doesn't require a cloud-based API endpoint. The future of AI isn't just in the cloud; it's increasingly on your desktop, powering specific tasks with efficient, locally deployed small models.



