Best LLMs for 16 GB VRAM

Written by Jakub Rusinowski · Last updated June 15, 2026

These are the strongest local models that fit entirely in 16 GB of VRAM, ranked by capability, with the quantization level and estimated tokens/sec needed to fit.

GPUs at This Tier

Ranked Models

Qwen 2.5 Family — Qwen 2.5 14B InstructQ4_K_M · 9.5 GB · ~34 tok/s on AMD Radeon RX 9060 XT 16GB
Qwen 3.5 (Legacy Listing — Unverified) — Qwen 3.5 122B-A10B (MoE)Q4_K_M · 13.5 GB · ~289 tok/s on AMD Radeon RX 9060 XT 16GB
Qwen 3 — Qwen 3 14BQ4_K_M · 9.5 GB · ~34 tok/s on AMD Radeon RX 9060 XT 16GB
Codestral — Codestral 22BQ4_K_M · 13 GB · ~25 tok/s on AMD Radeon RX 9060 XT 16GB
DeepSeek R1 — DeepSeek R1 Distill Qwen 14BQ4_K_M · 9.2 GB · ~35 tok/s on AMD Radeon RX 9060 XT 16GB
Phi-4 Family — Phi-4 (14B)Q4_K_M · 9.2 GB · ~35 tok/s on AMD Radeon RX 9060 XT 16GB
Qwen3-Coder — Qwen3-Coder 80B-A3B (MoE)Q4_K_M · 7.5 GB · ~400 tok/s on AMD Radeon RX 9060 XT 16GB
InternLM 3 — InternLM 3 20B InstructQ4_K_M · 12.5 GB · ~26 tok/s on AMD Radeon RX 9060 XT 16GB
Qwen 3 — Qwen 3 30B-A3B (MoE)Q4_K_M · 8 GB · ~400 tok/s on AMD Radeon RX 9060 XT 16GB
Granite 3.0 — Granite 3.0 8B InstructQ4_K_M · 5.5 GB · ~58 tok/s on AMD Radeon RX 9060 XT 16GB
Qwen 3.5 (Legacy Listing — Unverified) — Qwen 3.5 14BQ4_K_M · 9.5 GB · ~34 tok/s on AMD Radeon RX 9060 XT 16GB
Qwen 2.5 Family — Qwen 2.5 7B InstructQ4_K_M · 4.8 GB · ~67 tok/s on AMD Radeon RX 9060 XT 16GB
Llama 4 — Llama 4 Scout 17BQ4_K_M · 10.5 GB · ~195 tok/s on AMD Radeon RX 9060 XT 16GB
Qwen 3 — Qwen 3 8BQ4_K_M · 5.5 GB · ~58 tok/s on AMD Radeon RX 9060 XT 16GB
DeepSeek R1 — DeepSeek R1 Distill Llama 8BQ4_K_M · 5.8 GB · ~55 tok/s on AMD Radeon RX 9060 XT 16GB

FAQ

What LLMs run well with 16 GB VRAM?

Qwen 2.5 Family, Qwen 3.5 (Legacy Listing — Unverified), Qwen 3, Codestral, DeepSeek R1 all fit in 16 GB VRAM.

Which GPUs have 16 GB VRAM?

AMD Radeon RX 9060 XT 16GB, NVIDIA GeForce RTX 5060 Ti 16GB, NVIDIA GeForce RTX 4060 Ti 16GB, AMD Radeon RX 7800 XT.

← All VRAM Tiers | Check Your Hardware