Best LLMs for 12 GB VRAM

作者: Jakub Rusinowski · 最后更新: 2026年6月15日

These are the strongest local models that fit entirely in 12 GB of VRAM, ranked by capability, with the quantization level and estimated tokens/sec needed to fit.

GPUs at This Tier

Ranked Models

Qwen 2.5 Family — Qwen 2.5 14B InstructQ4_K_M · 9.5 GB · ~48 tok/s on Intel Arc B580
Qwen 3 — Qwen 3 14BQ4_K_M · 9.5 GB · ~48 tok/s on Intel Arc B580
DeepSeek R1 — DeepSeek R1 Distill Qwen 14BQ4_K_M · 9.2 GB · ~50 tok/s on Intel Arc B580
Phi-4 Family — Phi-4 (14B)Q4_K_M · 9.2 GB · ~50 tok/s on Intel Arc B580
Qwen3-Coder — Qwen3-Coder 80B-A3B (MoE)Q4_K_M · 7.5 GB · ~400 tok/s on Intel Arc B580
Qwen 3 — Qwen 3 30B-A3B (MoE)Q4_K_M · 8 GB · ~400 tok/s on Intel Arc B580
Granite 3.0 — Granite 3.0 8B InstructQ4_K_M · 5.5 GB · ~83 tok/s on Intel Arc B580
Qwen 3.5 (Legacy Listing — Unverified) — Qwen 3.5 14BQ4_K_M · 9.5 GB · ~48 tok/s on Intel Arc B580
Qwen 2.5 Family — Qwen 2.5 7B InstructQ4_K_M · 4.8 GB · ~95 tok/s on Intel Arc B580
Qwen 3 — Qwen 3 8BQ4_K_M · 5.5 GB · ~83 tok/s on Intel Arc B580
DeepSeek R1 — DeepSeek R1 Distill Llama 8BQ4_K_M · 5.8 GB · ~79 tok/s on Intel Arc B580
Qwen3-Coder — Qwen3-Coder 8BQ4_K_M · 5.5 GB · ~83 tok/s on Intel Arc B580
Mistral Family — Mistral NeMo 12BQ4_K_M · 8.5 GB · ~54 tok/s on Intel Arc B580
Qwen 3.5 (Legacy Listing — Unverified) — Qwen 3.5 14BQ4_K_M · 9 GB · ~51 tok/s on Intel Arc B580
Gemma 4 (Legacy Listing — Unverified) — Gemma 4 12BQ4_K_M · 7.8 GB · ~58 tok/s on Intel Arc B580

FAQ

What LLMs run well with 12 GB VRAM?

Qwen 2.5 Family, Qwen 3, DeepSeek R1, Phi-4 Family, Qwen3-Coder all fit in 12 GB VRAM.

Which GPUs have 12 GB VRAM?

Intel Arc B580, NVIDIA GeForce RTX 3060 (12GB), NVIDIA GeForce RTX 5070, NVIDIA GeForce RTX 4070 Super.

← All VRAM Tiers | Check Your Hardware