Name: LLM Configurator — GPU VRAM Checker
Author: LLM Configurator

Question 1

Can the Apple M2 Max run local LLMs?

Accepted Answer

Yes — the Apple M2 Max has 96 GB unified memory and runs Up to 96 GB unified memory. 90 t/s on 8B, 24 t/s on 70B models. Popular in Mac Studio configurations. Silent, 40W system

Question 2

How fast is the Apple M2 Max for AI inference?

Accepted Answer

The Apple M2 Max runs Llama 3.1 8B at ~90 tokens/sec with Q4_K_M quantization. For the 70B model it achieves ~24 tokens/sec.

Question 3

What LLMs can I run on 96 GB VRAM?

Accepted Answer

With 96 GB you can run: Llama 3.3, Llama 3.1 Family, DeepSeek R1, Qwen 3, Gemma 3. Use Ollama for the easiest setup: ollama run llama3.3:70b.

VRAM	96 GB unified memory
Memory Bandwidth	400 GB/s
TDP	40 W
Architecture	ARM, 5nm TSMC
Release Year	2023
MSRP at Launch	$3,499
Inference Speed (Llama 3.1 8B Q4_K_M)	~90 tokens/sec
Inference Speed (Llama 3.3 70B Q4_K_M)	~24 tokens/sec

Llama 3.3	24 GB VRAM · Q2_K_XS (Tight) · `ollama run llama3.3`
Llama 3.1 Family	6 GB VRAM · Q4_K_M · `ollama run llama3.1`
DeepSeek R1	20 GB VRAM · Q4_K_M · `ollama run deepseek-r1:32b`
Qwen 3	80 GB VRAM · Q4_K_M · `ollama run qwen3:235b-a22b`
Gemma 3	16 GB VRAM · Q4_K_M · `ollama run gemma3:27b`
Mistral Small 3.1	14 GB VRAM · Q4_K_M · `ollama run mistral-small3.1`
Phi-4 Family	10 GB VRAM · Q4_K_M · `ollama run phi4`

Apple M2 Max — Local LLM Performance & Compatibility

Technical Specifications

LLMs Compatible with 96 GB Unified Memory

Best Use Cases

Quick Start with Ollama

FAQ

Can the Apple M2 Max run local LLMs?

How fast is the Apple M2 Max for AI inference?

What LLMs can I run on 96 GB VRAM?

Compare Similar GPUs