Saturday, April 4, 2026

Google Gemma 4 Introduced: Now More Open and More Powerful

Google Gemma 4 Introduced: Now More Open and More Powerful

Google has officially announced Gemma 4, its next-generation family of open-weight models powered by Gemini 3-based technologies. This time, the most striking change is not only technical but also a fundamental shift in licensing.

Comes with four different models

The new Gemma 4 family consists of four distinct models designed to cater to different hardware levels. The 2 billion (E2B) and 4 billion (E4B) "Effective" models, developed for devices with more limited resources, specifically target smartphones and embedded systems. For more powerful systems, 26 billion parameter Mixture of Experts (MoE) and 31 billion parameter Dense models are offered.

According to Google's technical details, the 26B MoE model achieves high speed by actively using only 3.8 billion parameters during inference. This approach allows for a higher token generation rate compared to models of similar size. On the other hand, the 31B Dense model focuses on maximum accuracy and quality rather than speed.

The larger models are theoretically designed to run in bfloat16 format on a single 80GB Nvidia H100 GPU. When quantized with lower precision, they can also fit on consumer-grade GPUs.

Local operation is prioritized

One of Gemma 4's most critical features is its significantly improved ability to run on local hardware. Google states that it has focused particularly on reducing latency. According to the company's statements, "nearly zero latency" has been achieved with smaller models.

The E2B and E4B models have been optimized for devices such as smartphones, Raspberry Pi, and Jetson Nano, thanks to collaborative efforts with Qualcomm and MediaTek. These models consume less memory and battery compared to the previous generation.

Support for over 140 languages

The entire Gemma 4 family is not limited to text alone. The models can process images and videos, which enhances their use in areas such as OCR (optical character recognition) and graphic analysis. Smaller models also offer audio input and speech recognition support.

Google states that the models have been trained in over 140 languages and offer large context windows. Edge models support 128,000 tokens, while larger models support 256,000 tokens of context.

One of Google's most ambitious claims is that Gemma 4 has made a significant leap in "intelligence per parameter." According to data shared by the company, the 31B model ranked third and the 26B model ranked sixth in the Arena AI rankings. This performance is noteworthy as they outperformed models 20 times their size.

Gemma 4 also offers improvements close to Gemini 3 levels in areas such as reasoning, mathematics, and instruction following. Additionally, it has been made ready for agentic workflows with features like built-in function calling, structured JSON output, and API integrations.

Furthermore, the new model family can also generate code without an internet connection. Google emphasizes that Gemma 4, especially its larger variants, can produce code of near cloud-based solution quality with adequate hardware.

Switched to Apache 2.0 license

Perhaps the most critical change was not technical, but legal. Google abandoned the controversial proprietary license used in previous Gemma versions and switched to the Apache 2.0 license.

With this change, developers can now freely modify the models, use them in commercial projects, and distribute them freely on their own infrastructure or in the cloud.

Gemma 4 models are currently available for download via Hugging Face, Kaggle, and Ollama. They can also be tested through Google AI Studio and AI Edge Gallery platforms. Models developed for local use can also be run for a fee via Google Cloud for those who wish.

0 Comments: