DeepSeek V4 — Local AI Model by DeepSeek

Written by Jakub Rusinowski · Last updated June 15, 2026

DeepSeek's April 24, 2026 preview release, MIT licensed with a 1M-token context window. V4-Pro (1.6T total / 49B active) is server-class only — not runnable on consumer hardware even heavily quantized. V4-Flash (284B total / 13B active) is workstation-tier but local support is still WIP: llama.cpp has no merged upstream support as of June 2026 (only experimental forks), and Ollama's listings appear to route to cloud-hosted inference rather than a true local download. DeepSeek R2 remains unreleased/rumored as of June 15, 2026 and is intentionally not listed.

Hardware Requirements

DeepSeek V4-FlashMin 140 GB VRAM · Q4 (experimental) · 1,000,000 ctx · ollama run deepseek-v4-flash (cloud-hosted on Ollama; local = WIP llama.cpp forks only)
DeepSeek V4-ProMin 400 GB VRAM · Q2 (experimental, datacenter only) · 1,000,000 ctx ·

How to Run Locally

Install Ollama then run: ollama run deepseek-v4-flash (cloud-hosted on Ollama; local = WIP llama.cpp forks only)

Minimum VRAM: 140 GB. For best results use Q4_K_M quantization.

DeepSeek V4 — Frequently Asked Questions

How much VRAM does DeepSeek V4 need?

DeepSeek V4 needs about 140 GB VRAM at Q4 (experimental) quantization for its smallest variant. Variants: DeepSeek V4-Flash (140 GB, Q4 (experimental)); DeepSeek V4-Pro (400 GB, Q2 (experimental, datacenter only)). On Apple Silicon, unified memory counts toward this requirement.

Can I run DeepSeek V4 on an RTX 4090 (24 GB)?

DeepSeek V4's smallest variant needs about 140 GB, which exceeds a single RTX 4090 (24 GB). Use multiple GPUs, a higher-VRAM card, or Apple Silicon with large unified memory.

What quantization should I use for DeepSeek V4?

Q4_K_M is the best balance of quality and VRAM for DeepSeek V4 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run DeepSeek V4 with Ollama?

Install Ollama, then run: ollama run deepseek-v4-flash (cloud-hosted on Ollama; local = WIP llama.cpp forks only). This downloads DeepSeek V4 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.

Can I Run DeepSeek V4 on My GPU?