Best LLMs for 10 GB VRAM

Autor: Jakub Rusinowski · Ostatnia aktualizacja: 30 lipca 2026

These are the strongest local models that fit entirely in 10 GB of VRAM, ranked by capability, with the quantization level and estimated tokens/sec needed to fit.

GPUs at This Tier

Ranked Models

Qwen3-Coder — Qwen3-Coder 80B-A3B (MoE)	Q4_K_M · 7.5 GB · ~400 tok/s on Intel Arc B570
Qwen 3 — Qwen 3 30B-A3B (MoE)	Q4_K_M · 8 GB · ~400 tok/s on Intel Arc B570
Granite 3.0 — Granite 3.0 8B Instruct	Q4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
Bonsai 27B — Ternary Bonsai 27B	Ternary (1.58-bit, ~1.71 bpw) · 5.9 GB · ~64 tok/s on Intel Arc B570
Qwen 2.5 Family — Qwen 2.5 7B Instruct	Q4_K_M · 4.8 GB · ~79 tok/s on Intel Arc B570
Qwen 3 — Qwen 3 8B	Q4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
DeepSeek R1 — DeepSeek R1 Distill Llama 8B	Q4_K_M · 5.8 GB · ~66 tok/s on Intel Arc B570
Qwen3-Coder — Qwen3-Coder 8B	Q4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
Mistral Family — Mistral NeMo 12B	Q4_K_M · 8.5 GB · ~45 tok/s on Intel Arc B570
Gemma 4 (Legacy Listing — Unverified) — Gemma 4 12B	Q4_K_M · 7.8 GB · ~49 tok/s on Intel Arc B570
Gemma 3 — Gemma 3 12B Instruct	Q4_K_M · 8.1 GB · ~47 tok/s on Intel Arc B570
Qwen 3.5 (Legacy Listing — Unverified) — Qwen 3.5 7B	Q4_K_M · 4.8 GB · ~79 tok/s on Intel Arc B570
GLM-6 — GLM-6 9B	Q4_K_M · 6 GB · ~63 tok/s on Intel Arc B570
InternLM 3 — InternLM 3 8B Instruct	Q4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
Qwen 3.5 — Qwen 3.5 9B	Q4_K_M · 6.6 GB · ~58 tok/s on Intel Arc B570

FAQ

What LLMs run well with 10 GB VRAM?

Qwen3-Coder, Qwen 3, Granite 3.0, Bonsai 27B, Qwen 2.5 Family all fit in 10 GB VRAM.

Which GPUs have 10 GB VRAM?

Intel Arc B570, NVIDIA GeForce RTX 3080 (10GB), Intel Arc B580, NVIDIA GeForce RTX 3060 (12GB).

Best LLMs for 10 GB VRAM

GPUs at This Tier

Ranked Models

FAQ

What LLMs run well with 10 GB VRAM?

Which GPUs have 10 GB VRAM?

Can-I-Run Pages Near 10 GB

Adjacent VRAM Tiers

Buying Guide