← All Guides
Troubleshooting local LLM errors
Paste the error you're seeing. Most local-AI errors come down to one thing — the model is bigger than your hardware — and each guide tells you exactly how to fix it.
NVIDIA / CUDA
Apple Silicon
AMD / ROCm
All platforms
- Ollama: "model requires more system memory than is available" — You run `ollama run <model>` and instead of a prompt you get this one-liner and an immediate exit. It happens most on 8 …
- Model loads but runs painfully slow (it is on your CPU, not your GPU) — The model loads fine and answers correctly — but it crawls, a few tokens per second or worse, with your CPU fans roaring…
- LM Studio: "Failed to load model" (insufficient memory) — You pick a model in LM Studio, hit load, the progress bar moves — and then it fails with a red error. The message varies…
- Out of memory at long context (the KV cache, not the weights) — The model loads without complaint and answers short prompts — then falls over once you feed it a long document or a long…
- Which GGUF quant should I download? (Q4 vs Q5 vs Q8) — You found the model on Hugging Face and the repo has a dozen `.gguf` files: Q2_K, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, an…
- Out of disk space / failed GGUF download for large models — A model pull or a Hugging Face download runs for a while and then dies — "no space left on device", a truncated file, or…