Best LLMs for 10 GB VRAM

Written by Jakub Rusinowski · Last updated June 15, 2026

These are the strongest local models that fit entirely in 10 GB of VRAM, ranked by capability, with the quantization level and estimated tokens/sec needed to fit.

GPUs at This Tier

Ranked Models

Qwen3-Coder — Qwen3-Coder 80B-A3B (MoE)Q4_K_M · 7.5 GB · ~400 tok/s on Intel Arc B570
Qwen 3 — Qwen 3 30B-A3B (MoE)Q4_K_M · 8 GB · ~400 tok/s on Intel Arc B570
Granite 3.0 — Granite 3.0 8B InstructQ4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
Qwen 2.5 Family — Qwen 2.5 7B InstructQ4_K_M · 4.8 GB · ~79 tok/s on Intel Arc B570
Qwen 3 — Qwen 3 8BQ4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
DeepSeek R1 — DeepSeek R1 Distill Llama 8BQ4_K_M · 5.8 GB · ~66 tok/s on Intel Arc B570
Qwen3-Coder — Qwen3-Coder 8BQ4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
Mistral Family — Mistral NeMo 12BQ4_K_M · 8.5 GB · ~45 tok/s on Intel Arc B570
Gemma 4 (Legacy Listing — Unverified) — Gemma 4 12BQ4_K_M · 7.8 GB · ~49 tok/s on Intel Arc B570
Gemma 3 — Gemma 3 12B InstructQ4_K_M · 8.1 GB · ~47 tok/s on Intel Arc B570
Qwen 3.5 (Legacy Listing — Unverified) — Qwen 3.5 7BQ4_K_M · 4.8 GB · ~79 tok/s on Intel Arc B570
InternLM 3 — InternLM 3 8B InstructQ4_K_M · 5.5 GB · ~69 tok/s on Intel Arc B570
Qwen 3.5 — Qwen 3.5 9BQ4_K_M · 6.6 GB · ~58 tok/s on Intel Arc B570
Yi 1.5 Family — Yi 1.5 9B ChatQ4_K_M · 6.2 GB · ~61 tok/s on Intel Arc B570
Falcon 3 — Falcon 3 10B InstructQ4_K_M · 6.5 GB · ~58 tok/s on Intel Arc B570

FAQ

What LLMs run well with 10 GB VRAM?

Qwen3-Coder, Qwen 3, Granite 3.0, Qwen 2.5 Family, Qwen 3 all fit in 10 GB VRAM.

Which GPUs have 10 GB VRAM?

Intel Arc B570, NVIDIA GeForce RTX 3080 (10GB), Intel Arc B580, NVIDIA GeForce RTX 3060 (12GB).

← All VRAM Tiers | Check Your Hardware