T5Gemma 2 has arrived, marking a significant evolution in encoder-decoder models. This release introduces the first multimodal and long-context capabilities to the T5Gemma family, leveraging innovations from the Gemma 3 architecture. It signals a renewed focus on efficiency and specialized applications for this often-overlooked model paradigm, particularly for on-device deployment.
Beyond a mere re-training, T5Gemma 2 incorporates crucial architectural shifts designed for maximum efficiency at smaller scales. According to the announcement, tied word embeddings between the encoder and decoder significantly reduce the overall parameter count. This clever optimization allows developers to pack more active capabilities into the same memory footprint, a critical advantage for compact models like the new 270M-270M variant. Furthermore, the decoder's merged attention mechanism, combining self- and cross-attention, streamlines architectural complexity, improving model parallelization and benefiting inference speed.
