Out of disk space / failed GGUF download for large models

Name: LLM Configurator — GPU VRAM Checker
Author: LLM Configurator

Written by Jakub Rusinowski · Last updated June 15, 2026

Founder, LLM Configurator — AI educator & workshop leader on local LLM deployment

The error

Error: write /root/.ollama/models/blobs/sha256-...: no space left on device

When you see it

A model pull or a Hugging Face download runs for a while and then dies — "no space left on device", a truncated file, or a checksum/verification failure near the end. It's most common with the big quants and the larger models, which can be tens of gigabytes each.

What's actually going on

Two things bite here. First, GGUF files are large, and a few high-quant downloads will quietly fill a disk — or a partition (like / or your home folder) that's smaller than you think. Second, downloads need temporary space: the file is written, sometimes verified or unpacked, so you can need more free space than the final size while it lands. A flaky connection or hitting the ceiling mid-write leaves a corrupt partial.

How to fix it

1. Download a smaller quant — less disk, same model Most common fix

The quant you choose is also the file size you're committing to disk. A Q8_0 of a model can be two to three times the size of its Q4_K_M, for quality you likely won't notice. Picking a right-sized quant means a smaller download that both fits your disk and runs better on your GPU — it solves the storage problem and the VRAM problem at once. Check which quant you actually need before pulling a 40 GB file.

Check what fits your hardware — pick a quant that fits your disk and your GPU before downloading
Open the VRAM checker →

2. Check and free disk space

See where space is going and clear room. Old models you no longer use are the easiest win — each can be many gigabytes.

df -h .              # free space on this filesystem
ollama list          # see installed models and sizes
ollama rm <model>    # remove ones you do not need

3. Point the model store at a bigger drive

If your system disk is small but you have a larger drive, move the model directory there. Ollama respects OLLAMA_MODELS; Hugging Face respects HF_HOME. Set the location to the roomy disk before downloading.

export OLLAMA_MODELS=/mnt/big-drive/ollama
# or for Hugging Face downloads:
export HF_HOME=/mnt/big-drive/hf

4. Resume instead of restarting on a flaky connection

For large pulls, a dropped connection shouldn't mean starting over. Ollama resumes an interrupted pull if you re-run it; for direct Hugging Face downloads, use a tool that supports resuming so you don't re-fetch tens of gigabytes.

# Ollama: just re-run, it resumes
ollama pull <model>

# HF: the CLI resumes partial downloads
huggingface-cli download <repo> <file.gguf>

A model that fits most setups:

View model & requirements →

Frequently asked questions

How big are GGUF model files?

It varies with model size and quant — from under a gigabyte for small models at low quant to 40 GB+ for large models at high quant. The quant level is the main lever you control; lower quants are dramatically smaller.

My download keeps corrupting near the end — why?

Usually either you ran out of disk mid-write, or the connection dropped. Free space first, then use a resuming download (re-run ollama pull, or the HF CLI) so a blip does not waste the whole transfer.

Can I move models off my system drive?

Yes. Set OLLAMA_MODELS (Ollama) or HF_HOME (Hugging Face) to a larger drive before downloading, and existing models can be relocated there too. This keeps big GGUF files off a small system partition.