Autor: Jakub Rusinowski · Ostatnia aktualizacja: 15 czerwca 2026
MiniMax's 230B MoE model scoring 80.2% on SWE-bench — technically outperforming Claude Opus 4.6 on this benchmark — at 22.7× lower cost on output tokens. Supports up to 1M token context and includes multimodal capabilities. Available under a modified MIT license requiring attribution. Primarily API-accessed but self-hostable on multi-GPU infrastructure.
| MiniMax M2.5 230B | Min 130 GB VRAM · Q4_K_M · 1,000,000 ctx · |
Install Ollama then run: ollama run
Minimum VRAM: 130 GB. For best results use Q4_K_M quantization.
MiniMax M2.5 needs about 130 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: MiniMax M2.5 230B (130 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.
MiniMax M2.5's smallest variant needs about 130 GB, which exceeds a single RTX 4090 (24 GB). Use multiple GPUs, a higher-VRAM card, or Apple Silicon with large unified memory.
Q4_K_M is the best balance of quality and VRAM for MiniMax M2.5 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.
Install Ollama, then run: ollama run . This downloads MiniMax M2.5 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.