Devstral-2 — Local AI Model by Mistral AI

Written by Jakub Rusinowski · Last updated June 15, 2026

Mistral AI's second-generation coding specialist released April 2026. The 123B Sparse variant scores 71.6% on SWE-bench Verified — the highest open-source score for code agent tasks at the time of release. Built for agentic software engineering: multi-file editing, repo navigation, and test-driven development. Apache 2.0 licensed.

Hardware Requirements

Devstral-2 123BMin 68 GB VRAM · Q4_K_M · 128,000 ctx · ollama run devstral:123b
Devstral-2 22BMin 13 GB VRAM · Q4_K_M · 128,000 ctx · ollama run devstral:22b

How to Run Locally

Install Ollama then run: ollama run devstral:123b

Minimum VRAM: 13 GB. For best results use Q4_K_M quantization.

Devstral-2 — Frequently Asked Questions

How much VRAM does Devstral-2 need?

Devstral-2 needs about 13 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: Devstral-2 123B (68 GB, Q4_K_M); Devstral-2 22B (13 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.

Can I run Devstral-2 on an RTX 4090 (24 GB)?

Yes — Devstral-2 runs on an RTX 4090 (24 GB) and other 24 GB cards such as the RTX 3090. Smaller variants also fit comfortably on 8–16 GB GPUs at Q4_K_M.

What quantization should I use for Devstral-2?

Q4_K_M is the best balance of quality and VRAM for Devstral-2 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run Devstral-2 with Ollama?

Install Ollama, then run: ollama run devstral:123b. This downloads Devstral-2 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.

Can I Run Devstral-2 on My GPU?