作者: Jakub Rusinowski · 最后更新: 2026年6月15日
Founder, LLM Configurator — AI educator & workshop leader on local LLM deployment
Error: model requires more system memory (9.2 GiB) than is available (7.6 GiB)
You run ollama run <model> and instead of a prompt you get this one-liner and an immediate exit. It happens most on 8 GB and 16 GB machines, and especially on laptops where a chunk of RAM is already spoken for.
Ollama does a sanity check before loading: it estimates how much memory the model plus its context will need and compares that to what is actually free right now (not your total installed RAM). If the estimate is higher than the free pool, it refuses to start rather than thrash or get OOM-killed mid-load. The number in the error is that estimate — and it grows with the model size and the context length.
Most Ollama models publish several sizes and quant levels as tags. If llama3.1 (the default 8B) is too heavy, there is almost always a :1b or :3b sibling, or a more aggressive quant like :q4_0. Dropping to a smaller tag is the cleanest fix and keeps you on the same model family. Check what your machine can actually hold before you pull, so you only download once.
# instead of the heavy default tag:
ollama run llama3.2:3b
# or pick a smaller quant of the model you want:
ollama run llama3.1:8b-instruct-q4_0
The check is against *free* memory, so a browser with 40 tabs or another model still resident can be the difference. Quit the obvious offenders and, if a previous Ollama model is still loaded, unload it first.
ollama ps # see what is currently loaded
ollama stop <model> # unload it to free memory
A big context allocation inflates the estimate. If you do not need a huge window, cap it — this can pull the requirement back under your free memory.
# set a smaller context for this run
OLLAMA_CONTEXT_LENGTH=2048 ollama run llama3.2:3b
On Linux you can give the system swap space so the loader has somewhere to spill. This will run, but it will be slow once it touches disk — treat it as a stopgap, not a solution. The durable answer is still a model that fits in real RAM.
Not at all. It means this particular model is too big for your free memory. There are capable 1B–3B models that run comfortably on 8 GB machines — you just need to pick one sized for your hardware.
You can lower the requirement (smaller tag, smaller context, freeing RAM) but you should not try to bypass the check itself — it exists to stop the model from getting killed mid-load or freezing your machine. Reduce the requirement instead of overriding the guardrail.
The error compares against currently free memory, not installed RAM. Your OS, background apps, and any already-loaded model are using some of it. Closing things or rebooting frees more of the pool.