GPU Benchmark Leaderboard

Community-sourced tokens/sec benchmarks for all major GPUs running popular local LLMs.

Apple M4 Max (128 GB)~95 tokens/sec — Llama 3.1 8B
NVIDIA RTX 4090 (24 GB)~120 tokens/sec — Llama 3.1 8B
AMD RX 7900 XTX (24 GB)~80 tokens/sec — Llama 3.1 8B
NVIDIA RTX 4060 Ti (16 GB)~55 tokens/sec — Llama 3.1 8B
Apple M2 Ultra (192 GB)~85 tokens/sec — Llama 3.1 70B