Troubleshooting local LLM errors

Name: LLM Configurator — GPU VRAM Checker
Author: LLM Configurator

Paste the error you're seeing. Most local-AI errors come down to one thing — the model is bigger than your hardware — and each guide tells you exactly how to fix it.

NVIDIA / CUDA

CUDA out of memory — why it happens and how to fix it — The model starts loading, the GPU fans spin up, and then it dies — usually the moment the weights or the first batch hit…
"No CUDA-capable device is detected" — getting your GPU seen again — Your code or runtime can't find the GPU at all — `torch.cuda.is_available()` returns False, and anything that needs CUDA…

Apple Silicon

Apple Silicon: model won't use the GPU / Metal not engaged — On an M1/M2/M3/M4 Mac the model either runs slowly with no GPU activity in Activity Monitor, or it dies with a Metal buf…

AMD / ROCm

AMD GPU not detected / ROCm unsupported (Ollama & llama.cpp) — You have a Radeon card but Ollama runs on the CPU, or a ROCm build reports no devices. Common variants include `rocBLAS …

All platforms

Ollama: "model requires more system memory than is available" — You run `ollama run <model>` and instead of a prompt you get this one-liner and an immediate exit. It happens most on 8 …
Model loads but runs painfully slow (it is on your CPU, not your GPU) — The model loads fine and answers correctly — but it crawls, a few tokens per second or worse, with your CPU fans roaring…
LM Studio: "Failed to load model" (insufficient memory) — You pick a model in LM Studio, hit load, the progress bar moves — and then it fails with a red error. The message varies…
Out of memory at long context (the KV cache, not the weights) — The model loads without complaint and answers short prompts — then falls over once you feed it a long document or a long…
Which GGUF quant should I download? (Q4 vs Q5 vs Q8) — You found the model on Hugging Face and the repo has a dozen `.gguf` files: Q2_K, Q3_K_M, Q4_K_M, Q5_K_M, Q6_K, Q8_0, an…
Out of disk space / failed GGUF download for large models — A model pull or a Hugging Face download runs for a while and then dies — "no space left on device", a truncated file, or…