Google's Cormac Brick on Tiny LLMs for On-Device Agents

Google's Cormac Brick discusses the fine-tuning of Tiny LLMs for on-device agents, highlighting the benefits of LiteRT-LM and Gemma 4 for edge AI applications.

3 min read
Presentation slide showing the title 'TLMs: Tiny LLMs and Agents on Edge Devices' with a network diagram.
Image credit: AI Engineer Europe· AI Engineer

Cormac Brick, a Principal Engineer at Google AI Edge, recently presented insights into the advancements and practical applications of fine-tuning Tiny LLMs (TLMs) for on-device agents. The presentation, titled "From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents," detailed how these smaller, more efficient language models are being integrated into devices to provide powerful, responsive AI experiences. Brick highlighted the growing importance of running AI directly on devices, citing benefits such as reduced latency, enhanced privacy, offline usability, and cost savings compared to cloud-based solutions.

Google's Cormac Brick on Tiny LLMs for On-Device Agents - AI Engineer
Google's Cormac Brick on Tiny LLMs for On-Device Agents — from AI Engineer

The Google AI Edge Stack and TLMs

Brick outlined the Google AI Edge stack, which includes components like MediaPipe and LiteRT-LM, a cross-platform runtime designed to run powerful LLMs on Android, Chrome, and iOS. He explained that TLMs, defined as models with fewer than a billion parameters, are small enough to be integrated directly into applications, offering greater customization and reach. The presentation also touched upon the wider applicability of these models beyond Android, extending to iOS, macOS, Windows, web, and IoT devices.

Related startups

Benefits of On-Device AI

The core advantages of running AI on the edge were emphasized, including improved user experience through lower latency and the elimination of network dependency. Privacy is another significant benefit, as sensitive data can be processed directly on the device without being sent to the cloud. Furthermore, the ability to function offline ensures continuous availability of AI features, regardless of network connectivity. These factors contribute to significant savings by reducing the reliance on data center resources.

Gemma 4 and Agent Skills

A significant portion of the presentation focused on Gemma 4, a recent LLM from Google, and its application in creating on-device agents. Brick demonstrated how Gemma 4, when integrated with AI Core and the LiteRT-LM runtime, can power sophisticated agent skills. He showcased examples of the Google AI Edge Gallery app, which features various on-device AI functionalities powered by Gemma 4, such as text summarization, interactive chat, and image analysis. The presentation also highlighted the potential for creating custom agent skills, providing instructions and examples on GitHub for developers to build their own unique functionalities.

System vs. In-App Generative AI

Brick differentiated between system-level and in-app generative AI. System GenAI, which is pre-loaded onto devices, offers highly optimized and powerful features, with models typically ranging from 2 to 4 billion parameters. In contrast, in-app GenAI, utilizing smaller models (100 million to 1 billion parameters), offers greater customization for specific tasks and is integrated directly within individual applications. He noted that while system-level AI provides broad capabilities, in-app AI allows for tailored experiences.

TLMs in Production and Future Directions

The presentation concluded with a look at TLMs in production, emphasizing the efficiency and performance gains achieved through fine-tuning and optimization. Brick cited the use of LiteRT-LM to deploy models like Gemma 4 across a wide range of hardware, from mobile devices to embedded systems. He also pointed to the availability of pre-built TLMs on Hugging Face, encouraging developers to explore and experiment with these models. The ongoing development of LLMs for edge devices, including new API releases and expanded platform support, signals a significant shift towards more intelligent and capable on-device AI experiences.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.