The smallest AI supercomputer runs 120B LLMs offline

A US deep-tech startup, Tiiny AI, has officially unveiled what it claims is the world’s smallest personal AI supercomputer, the Pocket Lab, verified by Guinness World Records. The device is designed to run large language models (LLMs) with up to 120 billion parameters entirely on-device, completely bypassing the need for cloud connectivity, servers, or expensive, power-hungry GPUs.

The announcement, made in Hong Kong, represents a direct challenge to the prevailing cloud-centric AI ecosystem dominated by OpenAI and Google. The Pocket Lab is pocket-sized (14.2 × 8 × 2.53 cm) and operates within a 65W power envelope, aiming to solve the growing issues of privacy, sustainability, and rising energy costs associated with massive data centers.

Tiiny AI is betting that the real future of advanced intelligence is personal and private. “Cloud AI has brought remarkable progress, but it also created dependency, vulnerability, and sustainability challenges,” said Samar Bhoj, GTM Director of Tiiny AI. The company argues that intelligence should belong to the individual, not the data center, positioning the Pocket Lab as the first step toward truly accessible and private AI.

The device is built to handle the ‘golden zone’ of personal AI—models between 10B and 100B parameters—which Tiiny AI claims satisfies over 80 percent of real-world needs. By scaling up to 120B parameters, the Pocket Lab promises intelligence levels comparable to GPT-4o, enabling PhD-level reasoning, multi-step analysis, and deep contextual understanding, all while keeping sensitive data secure and offline.

The Technology Making Offline LLMs Possible

Achieving server-grade LLM performance on a device weighing only 300g required two core technological breakthroughs. The first is TurboSparse, a neuron-level sparse activation technique designed to significantly improve inference efficiency without sacrificing model intelligence.

The second is PowerInfer, an open-source heterogeneous inference engine that dynamically distributes heavy LLM workloads across the device’s custom ARMv9.2 12-core CPU and its dedicated deep Neural Processing Unit (dNPU). This co-design approach allows the Pocket Lab to deliver approximately 190 TOPS of AI compute power, performance that previously required professional GPUs costing thousands of dollars.

With 80GB of LPDDR5X memory and a 1TB SSD, the Pocket Lab is positioned for developers, researchers, and professionals who require secure, long-context processing. Because it operates fully offline, it offers true long-term personal memory by storing user data and preferences locally with bank-level encryption, a persistence feature cloud-based systems inherently struggle with.

Tiiny AI is also pushing an open-source agenda, offering one-click deployment for dozens of leading models, including Llama, Mistral, and DeepSeek, alongside popular agent frameworks. The company, formed in 2024 by engineers from MIT, Stanford, and Meta, secured a multi-million dollar seed round in 2025, signaling serious investor confidence in the shift toward decentralized AI hardware.

The Technology Making Offline LLMs Possible

The smallest AI supercomputer runs 120B LLMs offline

The Technology Making Offline LLMs Possible

AI Daily Digest

The smallest AI supercomputer runs 120B LLMs offline

The Technology Making Offline LLMs Possible

AI Daily Digest