Best LLMs for 4 GB VRAM

Written by Jakub Rusinowski · Last updated June 15, 2026

These are the strongest local models that fit entirely in 4 GB of VRAM, ranked by capability, with the quantization level and estimated tokens/sec needed to fit.

GPUs at This Tier

Ranked Models

Phi-4 Mini — Phi-4 Mini (3.8B)Q4_K_M · 2.5 GB · ~179 tok/s on NVIDIA GeForce RTX 5060
Qwen 3.5 — Qwen 3.5 4BQ4_K_M · 3.4 GB · ~132 tok/s on NVIDIA GeForce RTX 5060
Phi 3.5 Family — Phi 3.5 MiniQ4_K_M · 2.6 GB · ~172 tok/s on NVIDIA GeForce RTX 5060
Gemma 4 (Legacy Listing — Unverified) — Gemma 4 4BQ4_K_M · 3.2 GB · ~140 tok/s on NVIDIA GeForce RTX 5060
BitNet b1.58 — BitNet b1.58 3B1.58-bit · 1.8 GB · ~249 tok/s on NVIDIA GeForce RTX 5060
Gemma 3n — Gemma 3n E4BQ4_K_M · 3 GB · ~149 tok/s on NVIDIA GeForce RTX 5060
StarCoder 2 — StarCoder 2 3BQ4_K_M · 2 GB · ~224 tok/s on NVIDIA GeForce RTX 5060
EXAONE 3.5 — EXAONE 3.5 2.4BQ4_K_M · 1.8 GB · ~249 tok/s on NVIDIA GeForce RTX 5060
Aya 3B (Tiny Aya) — Aya 3BQ4_K_M · 2.2 GB · ~204 tok/s on NVIDIA GeForce RTX 5060
Qwen 3.5 — Qwen 3.5 2BQ4_K_M · 2.7 GB · ~166 tok/s on NVIDIA GeForce RTX 5060
IBM Granite 4.1 — Granite 4.1 3BQ4_K_M · 2 GB · ~224 tok/s on NVIDIA GeForce RTX 5060
Ministral — Ministral 3BQ4_K_M · 2.3 GB · ~195 tok/s on NVIDIA GeForce RTX 5060
Falcon 3 — Falcon 3 3B InstructQ4_K_M · 2 GB · ~224 tok/s on NVIDIA GeForce RTX 5060
Llama 3.2 Family — Llama 3.2 3B InstructQ4_K_M · 2.2 GB · ~204 tok/s on NVIDIA GeForce RTX 5060
Gemma 3n — Gemma 3n E2BQ4_K_M · 1.5 GB · ~299 tok/s on NVIDIA GeForce RTX 5060

FAQ

What LLMs run well with 4 GB VRAM?

Phi-4 Mini, Qwen 3.5, Phi 3.5 Family, Gemma 4 (Legacy Listing — Unverified), BitNet b1.58 all fit in 4 GB VRAM.

Which GPUs have 4 GB VRAM?

NVIDIA GeForce RTX 5060, NVIDIA GeForce RTX 4060, AMD Radeon RX 9060 XT 8GB, NVIDIA GeForce RTX 5060 Ti 8GB.

← All VRAM Tiers | Check Your Hardware