Gemma 3n — Local AI Model by Google DeepMind

Written by Jakub Rusinowski · Last updated June 15, 2026

Google's mobile-first multimodal model family. Uses the novel MatFormer nested architecture — a single model file contains multiple sub-models (E2B/E4B) that can run at different sizes. Processes text, images, audio, and video. Runs on phones without internet.

Hardware Requirements

Gemma 3n E2B	Min 2 GB VRAM · Q4_K_M · 32,768 ctx · `ollama run gemma3n:e2b`
Gemma 3n E4B	Min 3 GB VRAM · Q4_K_M · 32,768 ctx · `ollama run gemma3n:e4b`

How to Run Locally

Install Ollama then run: ollama run gemma3n:e2b

Minimum VRAM: 2 GB. For best results use Q4_K_M quantization.

Gemma 3n — Frequently Asked Questions

How much VRAM does Gemma 3n need?

Gemma 3n needs about 2 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: Gemma 3n E2B (2 GB, Q4_K_M); Gemma 3n E4B (3 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.

Can I run Gemma 3n on an RTX 4090 (24 GB)?

Yes — Gemma 3n runs on an RTX 4090 (24 GB) and other 24 GB cards such as the RTX 3090. Smaller variants also fit comfortably on 8–16 GB GPUs at Q4_K_M.

What quantization should I use for Gemma 3n?

Q4_K_M is the best balance of quality and VRAM for Gemma 3n in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run Gemma 3n with Ollama?

Install Ollama, then run: ollama run gemma3n:e2b. This downloads Gemma 3n and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.