Written by Jakub Rusinowski · Last updated June 15, 2026
Founder, LLM Configurator — AI educator & workshop leader on local LLM deployment
Error: write /root/.ollama/models/blobs/sha256-...: no space left on device
A model pull or a Hugging Face download runs for a while and then dies — "no space left on device", a truncated file, or a checksum/verification failure near the end. It's most common with the big quants and the larger models, which can be tens of gigabytes each.
Two things bite here. First, GGUF files are large, and a few high-quant downloads will quietly fill a disk — or a partition (like / or your home folder) that's smaller than you think. Second, downloads need temporary space: the file is written, sometimes verified or unpacked, so you can need more free space than the final size while it lands. A flaky connection or hitting the ceiling mid-write leaves a corrupt partial.
The quant you choose is also the file size you're committing to disk. A Q8_0 of a model can be two to three times the size of its Q4_K_M, for quality you likely won't notice. Picking a right-sized quant means a smaller download that both fits your disk and runs better on your GPU — it solves the storage problem and the VRAM problem at once. Check which quant you actually need before pulling a 40 GB file.
See where space is going and clear room. Old models you no longer use are the easiest win — each can be many gigabytes.
df -h . # free space on this filesystem
ollama list # see installed models and sizes
ollama rm <model> # remove ones you do not need
If your system disk is small but you have a larger drive, move the model directory there. Ollama respects OLLAMA_MODELS; Hugging Face respects HF_HOME. Set the location to the roomy disk before downloading.
export OLLAMA_MODELS=/mnt/big-drive/ollama
# or for Hugging Face downloads:
export HF_HOME=/mnt/big-drive/hf
For large pulls, a dropped connection shouldn't mean starting over. Ollama resumes an interrupted pull if you re-run it; for direct Hugging Face downloads, use a tool that supports resuming so you don't re-fetch tens of gigabytes.
# Ollama: just re-run, it resumes
ollama pull <model>
# HF: the CLI resumes partial downloads
huggingface-cli download <repo> <file.gguf>
It varies with model size and quant — from under a gigabyte for small models at low quant to 40 GB+ for large models at high quant. The quant level is the main lever you control; lower quants are dramatically smaller.
Usually either you ran out of disk mid-write, or the connection dropped. Free space first, then use a resuming download (re-run ollama pull, or the HF CLI) so a blip does not waste the whole transfer.
Yes. Set OLLAMA_MODELS (Ollama) or HF_HOME (Hugging Face) to a larger drive before downloading, and existing models can be relocated there too. This keeps big GGUF files off a small system partition.