NVIDIA GeForce RTX 5090 — Local AI Performance

The fastest consumer GPU for local AI. 32 GB VRAM fits Qwen 3 32B and Llama 3.3 70B with Q3 quantization on a single card. 67% faster than RTX 4090.

Quick Specs

VRAM32 GB
Memory Bandwidth1792 GB/s
TDP575 W
ArchitectureBlackwell GB202
MSRP$1,999
Speed (Llama 3.1 8B Q4_K_M)~213 tokens/sec

Compatible LLMs

← Check your hardware | Full benchmarks