Google DeepMind is pushing advanced AI capabilities directly to consumer hardware with the launch of its Gemma 4 12B model. This new offering aims to bring multimodal intelligence, capable of understanding images and audio alongside text, to laptops without relying on cloud processing.
Related startups
The 12 billion parameter model positions itself as a bridge between Google's smaller, edge-focused E4B and its larger 26B Mixture of Experts model. Its key innovation lies in a unified, encoder-free architecture.
No More Encoding Layers
Traditional multimodal AI systems typically use separate encoder modules to process different data types like images or audio before feeding them into a core language model. Gemma 4 12B eliminates these intermediate steps.
