Omar Sanseviero, Developer Experience Lead at Google DeepMind, recently presented an overview of the Gemma 4 family of open models at the AI Engineer Europe conference. The presentation highlighted the models' capabilities, performance, and the broader impact of open-source AI development. Sanseviero emphasized the rapid progress and broad adoption of Gemma since its initial release, showcasing its versatility across various applications and devices.
Omar Sanseviero's Role at Google DeepMind
As the Developer Experience Lead at Google DeepMind, Omar Sanseviero plays a crucial role in bridging the gap between cutting-edge AI research and practical application. His work focuses on ensuring that developers can easily access, understand, and utilize DeepMind's advanced AI models. Sanseviero's presentation underscores his team's commitment to fostering a thriving AI developer community by providing accessible tools and comprehensive resources.
Introducing the Gemma 4 Family of Models
Google DeepMind recently released Gemma 4, a family of open models designed to be both powerful and accessible. Sanseviero noted that the models were launched just days before his presentation, creating significant excitement. The Gemma family includes several versions, ranging from 1 billion to 27 billion parameters, each offering different trade-offs between performance, size, and computational requirements. These models are designed to be run on various infrastructures, from cloud servers to personal devices, making advanced AI capabilities more widely available.
Gemma Model Sizes and Capabilities
The Gemma 4 models are available in several sizes, including 2B, 4B, 26B A4B, and 31B parameters. Sanseviero presented a table detailing these models, their effective parameters in VRAM, GPU consumption at 8-bits, and their intended use cases. The 2B and 4B models are described as "tiny" and are suitable for edge and on-device applications, capable of running on Android, iOS, Raspberry Pi, and Jetson Nano. The 26B A4B model is noted for its very fast inference, while the 31B model offers maximum quality and fine-tuning capabilities, fitting on a single consumer GPU.
Sanseviero highlighted the efficiency of the 2B and 4B models, stating, "you can run in your own infrastructure, your own devices." He elaborated that these smaller models can perform tasks like multimodal reasoning and on-device inference, demonstrating their practical utility. The larger models, like the 31B, are positioned for more demanding tasks requiring higher intelligence and reasoning capabilities.
Gemma's Performance and Benchmarks
The presentation included a "Chatbot Arena Elo Score" chart, comparing Gemma models against other open models based on user preference. Gemma 3 (27B) achieved an Elo score of 1336, positioning it competitively among leading open models. Sanseviero pointed out that the smaller dots below each model represent the number of NVIDIA H100 GPUs required to run them, indicating the computational resources needed. He noted that even the smaller Gemma models are "extremely capable" and can run on-device, which is a significant achievement.
Further benchmarks were presented for MMLU Pro, GPQA Diamond, and MMMU Pro, assessing advanced general knowledge, PhD-level scientific reasoning, and complex multimodal understanding, respectively. The charts demonstrated that Gemma models consistently performed well across these diverse capabilities, often outperforming larger models in certain tasks.
Community Adoption and Open Source Ecosystem
Sanseviero emphasized the rapid adoption of Gemma by the developer community, citing over 450 million model downloads and more than 100,000 community variants. He highlighted examples of projects leveraging Gemma, including running models on devices like the Nintendo Switch and integrating them into applications like Android Studio. The open-source nature of Gemma, released under the Apache 2.0 license, is a key factor in this widespread adoption, allowing developers to freely fine-tune and deploy the models for their specific needs.
"It's not just about providing a model, it's about enabling the ecosystem to build on top of it," Sanseviero stated, emphasizing the importance of community-driven innovation. He showcased various tools and platforms that have integrated Gemma, such as Hugging Face, Unsloth AI, and Google AI Studio, demonstrating the growing support and accessibility of the Gemma ecosystem.
Gemma's Capabilities: Multimodal, On-Device, and Beyond
The presentation also touched upon Gemma's multimodal capabilities, showing how it can process audio, images, and text simultaneously. This allows for tasks such as understanding spoken language in different languages, analyzing images, and generating text based on visual or auditory input. Sanseviero highlighted the research on specific use cases, such as medical imaging analysis with models like MedGemma and MedSigLIP, which process medical text and images to assist in diagnosis and research.
The ability to run models efficiently on-device, even on devices with limited resources like smartphones, was a recurring theme. Sanseviero mentioned the use of a simple flag, "--override-tensor per_layer_token_embd.weight=CPU", to enable running models on the CPU, making them accessible even without high-end GPUs.
The Future of Gemma and Open AI
Sanseviero concluded by expressing excitement about the future of Gemma and the broader open-source AI movement. He encouraged developers to experiment with the models, build new applications, and share their creations with the community. The rapid progress and widespread adoption of Gemma demonstrate the power of open-source AI in driving innovation and making advanced technology accessible to a wider audience.
The presentation also touched upon initiatives like AI Singapore's SEA-LION models and research from startups such as Sarvam, which are also contributing to the open-source AI ecosystem, particularly in specialized domains like multilingual AI and medical research.
The overall message was clear: Google DeepMind is committed to democratizing AI through open models, and the Gemma family is a significant step forward in that mission, empowering developers worldwide to build the next generation of AI-powered applications.
