LLM Configurator — Free GPU VRAM Checker for Local AI Models
LLM Configurator is the definitive free tool for checking GPU compatibility with local LLMs. Enter your GPU's VRAM and system RAM to instantly discover which open-source AI models you can run — with Ollama install commands, speed estimates, and electricity cost calculations.
VRAM Requirements Quick Reference
VRAM Models You Can Run
2–4 GB SmolLM2 1.7B, Phi-3.5 Mini, BitNet 3B, Gemma 3 4B (Q4), Llama 3.2 1B/3B
6–8 GB Llama 3.1 8B, Gemma 3 4B (FP16), Qwen 2.5 7B, Phi-4 Mini, DeepSeek R1 8B
8–12 GB Phi-4 14B (Q4), Qwen 2.5 14B (Q4), Mistral NeMo 12B, Gemma 3 12B (Q4)
12–16 GB Llama 4 Scout 17B (Q4), Qwen 3 14B, Qwen 3 30B-A3B (MoE)
16–24 GB Qwen 3 32B (Q4), Mistral Small 3.1 24B (Q4), Gemma 3 27B (Q4)
24+ GB Llama 3.3 70B (Q4), DeepSeek R1 32B, Llama 4 Maverick (Q4)
Featured Models
Scout : 17B active / 109B total parameters (MoE). Requires ~10 GB VRAM at Q4. Run with: ollama run llama4:scout
Maverick : 17B active / 400B total parameters (MoE). Requires ~24 GB VRAM at Q4. Run with: ollama run llama4:maverick
State-of-the-art reasoning model available in 8B, 14B, 32B, and 671B variants. The 8B distill requires 6 GB VRAM: ollama run deepseek-r1:8b
Available in 1B (2 GB VRAM), 4B (3 GB), 12B (8 GB Q4), and 27B (18 GB Q4). Run: ollama run gemma3:4b
Includes 8B, 14B, 32B dense models and MoE variants 30B-A3B (only 3B active params, fits 6 GB!) and 235B-A22B.
24B multimodal (vision + text) model. 16 GB VRAM at Q4. Run: ollama run mistral-small3.1
Frequently Asked Questions
How much VRAM do I need to run local LLMs?
Minimum 4 GB for 3–4B models, 8 GB for 7–8B models, 16 GB for 13–30B models, and 24 GB for 70B models with Q4 quantization. Apple Silicon uses unified memory — a MacBook with 16 GB runs 7B models at 30–50 tokens/sec.
Can I run Llama 4 on my laptop?
Llama 4 Scout requires ~10 GB VRAM or 16 GB unified memory. It runs well on Apple M3/M4 Pro with 18 GB+. Maverick (400B total) needs 24 GB VRAM.
What is the difference between Ollama and LM Studio?
Ollama is CLI/API-first and ideal for developers. LM Studio is a GUI desktop app better suited for beginners. Both are free, support GGUF models, and provide an OpenAI-compatible local API.
Is running AI locally cheaper than ChatGPT?
For heavy users, yes. GPT-4o costs $2.50/M input tokens. 500K tokens/day costs ~$38/month. The same workload on an RTX 4060 costs ~$2–3/month in electricity. Break-even on hardware is typically 3–8 months. Use the Cost Calculator for your specific numbers.
Can I run an LLM on my phone without internet?
Yes. PocketPal AI (iOS & Android) and MLC Chat (Android) let you run Llama 3.2 3B, Phi-3 Mini, or Gemma 3 4B completely offline. See the phone guide .
What is quantization?
Quantization compresses model weights from 16-bit (FP16) to 4-bit or 8-bit integers, reducing VRAM requirements by 2–4×. Q4_K_M is the most popular format — it loses less than 2–3% quality vs FP16.
Setup Guides
About LLM Configurator
LLM Configurator is a free, independent tool built by AI enthusiasts for the local AI community. It supports 75+ open-source models including Llama 4, Qwen 3, Gemma 3, DeepSeek R1/V3, Mistral Small 3.1, and Phi-4 Mini. No account required. No ads. Free forever.
Contact: contact@llmconfigurator.com
Full AI-readable content: llms-full.txt | llms.txt | sitemap.xml