Google DeepMind Unveils Gemma 4 AI Models

Google DeepMind releases Gemma 4, a new family of open-source AI models featuring advanced architectures, multimodal capabilities, and improved performance.

4 min read
Presentation slide showing the Gemma 4 models and their features.
Image credit: Google DeepMind· AI Engineer

Google DeepMind has officially launched its Gemma 4 family of open-source AI models, marking a significant leap forward in accessible artificial intelligence. The new suite of models demonstrates substantial improvements in architecture, performance, and multimodal capabilities compared to its predecessor, Gemma 3.

Google DeepMind Unveils Gemma 4 AI Models - AI Engineer
Google DeepMind Unveils Gemma 4 AI Models — from AI Engineer

Meet the Gemma 4 Family

The Gemma 4 family comprises four distinct models, each engineered for specific use cases and performance targets. These range from highly efficient, on-device models suitable for mobile applications to more powerful, cloud-optimized versions for complex reasoning tasks.

The models include:

  • Gemma 4 31B: A large, dense model designed for advanced reasoning and featuring a 256K context window. It supports text and vision modalities and is targeted for high-end hardware like the NVIDIA V100 32GB.
  • Gemma 4 26B 4A: This model utilizes a Mixture-of-Experts (MoE) architecture with 128 active experts, optimized for specialized efficiency and hyper-granular domain intelligence. It also boasts a 256K context window and supports text and vision modalities, running on MacBooks and cloud infrastructure.
  • Gemma 4 E4B: A smaller, dense model with a 128K context window, supporting text, vision, and audio inputs. It is designed for on-device applications, running on Pixel and Qualcomm chipsets.
  • Gemma 4 E2B: The most compact model in the family, also dense, with a 128K context window and supporting text, vision, and audio. It is optimized for on-device deployment on Pixel and Qualcomm hardware.

Architectural Innovations

DeepMind has introduced several key architectural enhancements in Gemma 4. The models feature a 5:1 ratio of local to global layers, with a sliding context window for local layers and global attention in the final layer. This approach aims to balance computational efficiency with the ability to process long-range dependencies.

For the smaller models like E2B and E4B, a 4:1 ratio of local to global layers is used. A significant improvement is the implementation of grouped query attention (GQA) in the MoE models, which allows for more efficient handling of queries, keys, and values. The 26B 4A MoE model, for instance, groups 8 queries sharing the same key and value heads, while the smaller dense models group 2 queries. Furthermore, the use of per-layer embedding tables allows for distinct embeddings for each token at every layer, enhancing the model's ability to capture nuanced information.

Performance and Benchmarks

Gemma 4 models have set new performance standards for open-source alternatives. The 31B dense model, for example, achieved the #3 spot on the global Arena AI leaderboard for thinking tasks, outperforming models 20 times its size. The models demonstrate strong performance across various benchmarks, including MMLU, AIME 2026, LiveCodeBench, GPQA Diamond, and v3-bench, often surpassing previous open-source models in both reasoning and coding tasks.

Apache 2.0 Licensing

A crucial aspect of the Gemma 4 release is its Apache 2.0 license. This permissive license significantly lowers the barrier to entry for developers and researchers, allowing for broad adoption and integration into a wide range of commercial and research applications. This move democratizes access to powerful AI capabilities, fostering further innovation within the AI community.

Getting Started with Gemma 4

Developers can get started with Gemma 4 in several ways:

  • Download and Self-Host: All model sizes are available for download and local hosting via platforms like Hugging Face and Kaggle. This provides full control over the model and its deployment.
  • Cloud Hosted: For quicker prototyping and enterprise-scale deployment, Gemma 4 models are accessible through AI Studio and Vertex AI. These platforms offer features like native function calling and direct JSON output integration, streamlining the development process.

The models also support multimodal inputs, including audio, vision, and text, making them versatile for a variety of applications, from real-time analysis to complex creative tasks.

© 2026 StartupHub.ai. All rights reserved. Do not enter, scrape, copy, reproduce, or republish this article in whole or in part. Use as input to AI training, fine-tuning, retrieval-augmented generation, or any machine-learning system is prohibited without written license. Substantially-similar derivative works will be pursued to the fullest extent of applicable copyright, database, and computer-misuse laws. See our terms.