MiniMax M2.5 — Local AI Model by MiniMax

Written by Jakub Rusinowski · Last updated June 15, 2026

MiniMax's 230B MoE model scoring 80.2% on SWE-bench — technically outperforming Claude Opus 4.6 on this benchmark — at 22.7× lower cost on output tokens. Supports up to 1M token context and includes multimodal capabilities. Available under a modified MIT license requiring attribution. Primarily API-accessed but self-hostable on multi-GPU infrastructure.

Hardware Requirements

MiniMax M2.5 230BMin 130 GB VRAM · Q4_K_M · 1,000,000 ctx ·

How to Run Locally

Install Ollama then run: ollama run

Minimum VRAM: 130 GB. For best results use Q4_K_M quantization.

MiniMax M2.5 — Frequently Asked Questions

How much VRAM does MiniMax M2.5 need?

MiniMax M2.5 needs about 130 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: MiniMax M2.5 230B (130 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.

Can I run MiniMax M2.5 on an RTX 4090 (24 GB)?

MiniMax M2.5's smallest variant needs about 130 GB, which exceeds a single RTX 4090 (24 GB). Use multiple GPUs, a higher-VRAM card, or Apple Silicon with large unified memory.

What quantization should I use for MiniMax M2.5?

Q4_K_M is the best balance of quality and VRAM for MiniMax M2.5 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run MiniMax M2.5 with Ollama?

Install Ollama, then run: ollama run . This downloads MiniMax M2.5 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.

Can I Run MiniMax M2.5 on My GPU?