Gemma 4 — Local AI Model by Google DeepMind

Google DeepMind's breakthrough open-weight model. Gemma 4 27B delivers GPT-4 level performance at 14 GB VRAM, making frontier-quality AI accessible on consumer GPUs. Features a hybrid architecture with interleaved local and global attention, multi-image understanding, and 128k context window. Achieves 85 tokens/second on RTX 4090.

Hardware Requirements

Gemma 4 4BMin 4 GB VRAM · Q4_K_M · 128,000 ctx · ollama run gemma4:4b
Gemma 4 12BMin 8 GB VRAM · Q4_K_M · 128,000 ctx · ollama run gemma4:12b
Gemma 4 27B ⭐Min 14 GB VRAM · Q4_K_M · 128,000 ctx · ollama run gemma4:27b

How to Run Locally

Install Ollama then run: ollama run gemma4:4b

Minimum VRAM: 4 GB. For best results use Q4_K_M quantization.