Name: LLM Configurator — GPU VRAM Checker
Author: LLM Configurator

Question 1

Can the Apple M3 Pro run local LLMs?

Accepted Answer

Yes — the Apple M3 Pro has 36 GB unified memory and runs The 3nm upgrade to M2 Pro. Up to 36 GB unified memory at 153 GB/s. 55 t/s on 8B models. Handles Qwen 3 14B and Phi-4 14B

Question 2

How fast is the Apple M3 Pro for AI inference?

Accepted Answer

The Apple M3 Pro runs Llama 3.1 8B at ~55 tokens/sec with Q4_K_M quantization.

Question 3

What LLMs can I run on 36 GB VRAM?

Accepted Answer

With 36 GB you can run: Llama 3.1 Family, Llama 3.2 Family, Qwen 3, Gemma 3, Phi-4 Family. Use Ollama for the easiest setup: ollama run qwen3:14b.

VRAM	36 GB unified memory
Memory Bandwidth	153 GB/s
TDP	30 W
Architecture	ARM, 3nm TSMC
Release Year	2023
MSRP at Launch	$1,999
Inference Speed (Llama 3.1 8B Q4_K_M)	~55 tokens/sec

Llama 3.1 Family	6 GB VRAM · Q4_K_M · `ollama run llama3.1`
Llama 3.2 Family	8 GB VRAM · Q4_K_M · `ollama run llama3.2-vision:11b`
Qwen 3	20 GB VRAM · Q4_K_M · `ollama run qwen3:32b`
Gemma 3	16 GB VRAM · Q4_K_M · `ollama run gemma3:27b`
Phi-4 Family	10 GB VRAM · Q4_K_M · `ollama run phi4`
Phi-4 Mini	2 GB VRAM · Q4_K_M · `ollama run phi4-mini`
Mistral Family	16 GB VRAM · Q4_K_M · `ollama run mistral-small`
DeepSeek R1	20 GB VRAM · Q4_K_M · `ollama run deepseek-r1:32b`

Apple M3 Pro — Local LLM Performance & Compatibility

Technical Specifications

LLMs Compatible with 36 GB Unified Memory

Best Use Cases

Quick Start with Ollama

FAQ

Can the Apple M3 Pro run local LLMs?

How fast is the Apple M3 Pro for AI inference?

What LLMs can I run on 36 GB VRAM?

Compare Similar GPUs