Ollama: "model requires more system memory than is available"

Name: LLM Configurator — GPU VRAM Checker
Author: LLM Configurator

作者： Jakub Rusinowski · 最后更新： 2026年6月15日

Founder, LLM Configurator — AI educator & workshop leader on local LLM deployment

The error

Error: model requires more system memory (9.2 GiB) than is available (7.6 GiB)

When you see it

You run ollama run <model> and instead of a prompt you get this one-liner and an immediate exit. It happens most on 8 GB and 16 GB machines, and especially on laptops where a chunk of RAM is already spoken for.

What's actually going on

Ollama does a sanity check before loading: it estimates how much memory the model plus its context will need and compares that to what is actually free right now (not your total installed RAM). If the estimate is higher than the free pool, it refuses to start rather than thrash or get OOM-killed mid-load. The number in the error is that estimate — and it grows with the model size and the context length.

How to fix it

1. Pull a smaller tag of the same model Most common fix

Most Ollama models publish several sizes and quant levels as tags. If llama3.1 (the default 8B) is too heavy, there is almost always a :1b or :3b sibling, or a more aggressive quant like :q4_0. Dropping to a smaller tag is the cleanest fix and keeps you on the same model family. Check what your machine can actually hold before you pull, so you only download once.

# instead of the heavy default tag:
ollama run llama3.2:3b
# or pick a smaller quant of the model you want:
ollama run llama3.1:8b-instruct-q4_0

Check what fits your hardware — see which Ollama models fit your RAM before you pull them
Open the VRAM checker →

2. Close what is eating your RAM

The check is against *free* memory, so a browser with 40 tabs or another model still resident can be the difference. Quit the obvious offenders and, if a previous Ollama model is still loaded, unload it first.

ollama ps          # see what is currently loaded
ollama stop <model> # unload it to free memory

3. Shrink the context window

A big context allocation inflates the estimate. If you do not need a huge window, cap it — this can pull the requirement back under your free memory.

# set a smaller context for this run
OLLAMA_CONTEXT_LENGTH=2048 ollama run llama3.2:3b

4. Add swap as a last resort

On Linux you can give the system swap space so the loader has somewhere to spill. This will run, but it will be slow once it touches disk — treat it as a stopgap, not a solution. The durable answer is still a model that fits in real RAM.

A model that fits most setups:

View model & requirements →

Frequently asked questions

Does this mean my computer is too weak for local AI?

Not at all. It means this particular model is too big for your free memory. There are capable 1B–3B models that run comfortably on 8 GB machines — you just need to pick one sized for your hardware.

Can I force Ollama to run the model anyway?

You can lower the requirement (smaller tag, smaller context, freeing RAM) but you should not try to bypass the check itself — it exists to stop the model from getting killed mid-load or freezing your machine. Reduce the requirement instead of overriding the guardrail.

Why does it say less is available than my total RAM?

The error compares against currently free memory, not installed RAM. Your OS, background apps, and any already-loaded model are using some of it. Closing things or rebooting frees more of the pool.