Best LLMs for 8 GB VRAM

作者: Jakub Rusinowski · 最后更新: 2026年6月15日

These are the strongest local models that fit entirely in 8 GB of VRAM, ranked by capability, with the quantization level and estimated tokens/sec needed to fit.

GPUs at This Tier

Ranked Models

Granite 3.0 — Granite 3.0 8B InstructQ4_K_M · 5.5 GB · ~81 tok/s on NVIDIA GeForce RTX 5060
Qwen 2.5 Family — Qwen 2.5 7B InstructQ4_K_M · 4.8 GB · ~93 tok/s on NVIDIA GeForce RTX 5060
Qwen 3 — Qwen 3 8BQ4_K_M · 5.5 GB · ~81 tok/s on NVIDIA GeForce RTX 5060
DeepSeek R1 — DeepSeek R1 Distill Llama 8BQ4_K_M · 5.8 GB · ~77 tok/s on NVIDIA GeForce RTX 5060
Qwen3-Coder — Qwen3-Coder 8BQ4_K_M · 5.5 GB · ~81 tok/s on NVIDIA GeForce RTX 5060
Qwen 3.5 (Legacy Listing — Unverified) — Qwen 3.5 7BQ4_K_M · 4.8 GB · ~93 tok/s on NVIDIA GeForce RTX 5060
InternLM 3 — InternLM 3 8B InstructQ4_K_M · 5.5 GB · ~81 tok/s on NVIDIA GeForce RTX 5060
Qwen 3.5 — Qwen 3.5 9BQ4_K_M · 6.6 GB · ~68 tok/s on NVIDIA GeForce RTX 5060
Yi 1.5 Family — Yi 1.5 9B ChatQ4_K_M · 6.2 GB · ~72 tok/s on NVIDIA GeForce RTX 5060
Falcon 3 — Falcon 3 10B InstructQ4_K_M · 6.5 GB · ~69 tok/s on NVIDIA GeForce RTX 5060
GLM-4.7 / GLM-Z1 — GLM-4.7 9BQ4_K_M · 6.2 GB · ~72 tok/s on NVIDIA GeForce RTX 5060
GLM-5 / GLM-5.1 — GLM-5 9BQ4_K_M · 6 GB · ~75 tok/s on NVIDIA GeForce RTX 5060
Gemma 2 Family — Gemma 2 9B ITQ4_K_M · 6.8 GB · ~66 tok/s on NVIDIA GeForce RTX 5060
Llama 3.1 Family — Llama 3.1 8B InstructQ4_K_M · 6.5 GB · ~69 tok/s on NVIDIA GeForce RTX 5060
IBM Granite 4.1 — Granite 4.1 8BQ4_K_M · 5 GB · ~90 tok/s on NVIDIA GeForce RTX 5060

FAQ

What LLMs run well with 8 GB VRAM?

Granite 3.0, Qwen 2.5 Family, Qwen 3, DeepSeek R1, Qwen3-Coder all fit in 8 GB VRAM.

Which GPUs have 8 GB VRAM?

NVIDIA GeForce RTX 5060, NVIDIA GeForce RTX 4060, AMD Radeon RX 9060 XT 8GB, NVIDIA GeForce RTX 5060 Ti 8GB.

← All VRAM Tiers | Check Your Hardware