Mistral Small 4 — Local AI Model by Mistral AI

作者： Jakub Rusinowski · 最后更新： 2026年7月30日

Mistral AI's March 16, 2026 release unifying its former Magistral/Pixtral/Devstral lines into a single 119B-total / ~6.5B-active MoE model (128 experts, 4 active per token) with text + image input, reasoning, and agentic coding. Apache 2.0 licensed, 256K context. At Q4 it fits a single 24GB GPU (RTX 4090).

Hardware Requirements

Mistral Small 4 119B-A6.5B Min 73 GB VRAM · Q4_K_M · 256,000 ctx · ollama run mistral-small (community GGUF quants; check tag for 119B build)

Recommended GPU

The cheapest GPU that runs Mistral Small 4 locally (min 73 GB VRAM) is the AMD Ryzen AI Max+ 395 (96 GB).

联盟营销声明: 本页部分链接为联盟推广链接——如果你通过它们购买，LLM Configurator 可能会获得佣金，而你无需支付任何额外费用。作为亚马逊联盟成员（Amazon Associate），LLM Configurator 会从符合条件的购买中获得收益。

GMKtec EVO-X2 (Ryzen AI Max+ 395, 128GB)

首发建议零售价：$2,349

2026年价格波动较大——请以当前商品页价格为准。

在亚马逊查看价格

How to Run Locally

Install Ollama then run: ollama run mistral-small (community GGUF quants; check tag for 119B build)

Minimum VRAM: 73 GB. For best results use Q4_K_M quantization.

Mistral Small 4 — Frequently Asked Questions

How much VRAM does Mistral Small 4 need?

Mistral Small 4 needs about 73 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: Mistral Small 4 119B-A6.5B (73 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.

Can I run Mistral Small 4 on an RTX 4090 (24 GB)?

Mistral Small 4's smallest variant needs about 73 GB, which exceeds a single RTX 4090 (24 GB). Use multiple GPUs, a higher-VRAM card, or Apple Silicon with large unified memory.

What quantization should I use for Mistral Small 4?

Q4_K_M is the best balance of quality and VRAM for Mistral Small 4 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run Mistral Small 4 with Ollama?

Install Ollama, then run: ollama run mistral-small (community GGUF quants; check tag for 119B build). This downloads Mistral Small 4 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.