DeepSeek V4 — Local AI Model by DeepSeek

Written by Jakub Rusinowski · Last updated July 30, 2026

DeepSeek's April 24, 2026 preview release, MIT licensed with a 1M-token context window. V4-Pro (1.6T total / 49B active) is server-class only — not runnable on consumer hardware even heavily quantized. V4-Flash (284B total / 13B active) is workstation-tier but local support is still WIP: llama.cpp has no merged upstream support as of June 2026 (only experimental forks), and Ollama's listings appear to route to cloud-hosted inference rather than a true local download. DeepSeek R2 remains unreleased/rumored as of June 15, 2026 and is intentionally not listed.

Hardware Requirements

DeepSeek V4-Flash	Min 172 GB VRAM · Q4 (experimental) · 1,000,000 ctx · `ollama run deepseek-v4-flash (cloud-hosted on Ollama; local = WIP llama.cpp forks only)`
DeepSeek V4-Pro	Min 967 GB VRAM · Q2 (experimental, datacenter only) · 1,000,000 ctx ·

Recommended GPU

The cheapest GPU that runs DeepSeek V4 locally (min 172 GB VRAM) is the Apple M2 Ultra (192 GB).

Affiliate disclosure: Some links on this page are affiliate links — if you buy through them, LLM Configurator may earn a commission at no extra cost to you. As an Amazon Associate, LLM Configurator earns from qualifying purchases.

Apple Mac Studio M2 Ultra

2026 prices are volatile — check the current listing.

Check price on Amazon

How to Run Locally

Install Ollama then run: ollama run deepseek-v4-flash (cloud-hosted on Ollama; local = WIP llama.cpp forks only)

Minimum VRAM: 172 GB. For best results use Q4_K_M quantization.

DeepSeek V4 — Frequently Asked Questions

How much VRAM does DeepSeek V4 need?

DeepSeek V4 needs about 172 GB VRAM at Q4 (experimental) quantization for its smallest variant. Variants: DeepSeek V4-Flash (172 GB, Q4 (experimental)); DeepSeek V4-Pro (967 GB, Q2 (experimental, datacenter only)). On Apple Silicon, unified memory counts toward this requirement.

Can I run DeepSeek V4 on an RTX 4090 (24 GB)?

DeepSeek V4's smallest variant needs about 172 GB, which exceeds a single RTX 4090 (24 GB). Use multiple GPUs, a higher-VRAM card, or Apple Silicon with large unified memory.

What quantization should I use for DeepSeek V4?

Q4_K_M is the best balance of quality and VRAM for DeepSeek V4 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run DeepSeek V4 with Ollama?

Install Ollama, then run: ollama run deepseek-v4-flash (cloud-hosted on Ollama; local = WIP llama.cpp forks only). This downloads DeepSeek V4 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.