For years, the rule of AI has been simple: bigger models require bigger data centers. Tiiny AI just broke that rule. The deep-tech startup demonstrated a 120-billion-parameter large language model (LLM) running fully offline on a 14-year-old consumer PC—a 2011 machine rocking an Intel Core i3-530 and 2GB of DDR3 RAM.
This isn't just a clever stunt; it’s a direct challenge to the cloud-centric model dominating the industry. The demonstration, captured in one uninterrupted take, shows the GPT-OSS 120B model operating at speeds approaching 20 tokens per second, all while the host PC remains completely disconnected from the internet.
