作者: Jakub Rusinowski · 最后更新: 2026年6月15日
Mistral AI's March 16, 2026 release unifying its former Magistral/Pixtral/Devstral lines into a single 119B-total / ~6.5B-active MoE model (128 experts, 4 active per token) with text + image input, reasoning, and agentic coding. Apache 2.0 licensed, 256K context. At Q4 it fits a single 24GB GPU (RTX 4090).
| Mistral Small 4 119B-A6.5B | Min 24 GB VRAM · Q4_K_M · 256,000 ctx · ollama run mistral-small (community GGUF quants; check tag for 119B build) |
Install Ollama then run: ollama run mistral-small (community GGUF quants; check tag for 119B build)
Minimum VRAM: 24 GB. For best results use Q4_K_M quantization.
Mistral Small 4 needs about 24 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: Mistral Small 4 119B-A6.5B (24 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.
Yes — Mistral Small 4 runs on an RTX 4090 (24 GB) and other 24 GB cards such as the RTX 3090. Smaller variants also fit comfortably on 8–16 GB GPUs at Q4_K_M.
Q4_K_M is the best balance of quality and VRAM for Mistral Small 4 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.
Install Ollama, then run: ollama run mistral-small (community GGUF quants; check tag for 119B build). This downloads Mistral Small 4 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.