Llama 4 — Local AI Model by Meta

Meta's groundbreaking Mixture-of-Experts (MoE) series. Llama 4 uses a sparse MoE architecture where only a fraction of parameters activate per token, delivering frontier-class intelligence with far lower hardware requirements than the total parameter count suggests.

Hardware Requirements

Llama 4 Scout 17BMin 10 GB VRAM · Q4_K_M · 10,000,000 ctx · ollama run llama4:scout
Llama 4 Maverick 17BMin 24 GB VRAM · Q4_K_M · 1,000,000 ctx · ollama run llama4:maverick

How to Run Locally

Install Ollama then run: ollama run llama4:scout

Minimum VRAM: 10 GB. For best results use Q4_K_M quantization.