Devstral-2 — Local AI Model by Mistral AI

Written by Jakub Rusinowski · Last updated July 30, 2026

Mistral AI's second-generation coding specialist released April 2026. The 123B Sparse variant scores 71.6% on SWE-bench Verified — the highest open-source score for code agent tasks at the time of release. Built for agentic software engineering: multi-file editing, repo navigation, and test-driven development. Apache 2.0 licensed.

Hardware Requirements

Devstral-2 123B	Min 75 GB VRAM · Q4_K_M · 128,000 ctx · `ollama run devstral:123b`
Devstral-2 22B	Min 14 GB VRAM · Q4_K_M · 128,000 ctx · `ollama run devstral:22b`

Recommended GPU

The cheapest GPU that runs Devstral-2 locally (min 14 GB VRAM) is the AMD Radeon RX 9060 XT 16GB (16 GB).

Affiliate disclosure: Some links on this page are affiliate links — if you buy through them, LLM Configurator may earn a commission at no extra cost to you. As an Amazon Associate, LLM Configurator earns from qualifying purchases.

AMD Radeon RX 9060 XT 16GB

Launch MSRP: $349

2026 prices are volatile — check the current listing.

Check price on Amazon

How to Run Locally

Install Ollama then run: ollama run devstral:123b

Minimum VRAM: 14 GB. For best results use Q4_K_M quantization.

Devstral-2 — Frequently Asked Questions

How much VRAM does Devstral-2 need?

Devstral-2 needs about 14 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: Devstral-2 123B (75 GB, Q4_K_M); Devstral-2 22B (14 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.

Can I run Devstral-2 on an RTX 4090 (24 GB)?

Yes — Devstral-2 runs on an RTX 4090 (24 GB) and other 24 GB cards such as the RTX 3090. Smaller variants also fit comfortably on 8–16 GB GPUs at Q4_K_M.

What quantization should I use for Devstral-2?

Q4_K_M is the best balance of quality and VRAM for Devstral-2 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run Devstral-2 with Ollama?

Install Ollama, then run: ollama run devstral:123b. This downloads Devstral-2 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.