Qwen 3.5 — Local AI Model by Alibaba Cloud

Name: LLM Configurator — GPU VRAM Checker
Author: LLM Configurator

Autor: Jakub Rusinowski · Ostatnia aktualizacja: 10 lipca 2026

Alibaba's next-generation flagship family, released in waves from Feb 16 to Mar 2, 2026 (397B-A17B first, then the 122B/35B-A3B/27B tier on Feb 24, then the 0.8B–9B 'Small' series on Mar 2). Qwen 3.5 introduces a hybrid Gated DeltaNet + MoE architecture delivering frontier performance at a fraction of the compute, with native multimodal vision, ~256K context (extensible to 1M on the larger tiers), 201 languages, and Apache 2.0 license. Note: the 397B-A17B flagship is only available as a cloud-hosted Ollama tag (`:cloud`), not a true local download — everything 0.8B–122B-A10B has standard local Ollama tags.

Hardware Requirements

Qwen 3.5 0.8B	Min 1 GB VRAM · Q4_K_M · 256,000 ctx · `ollama run qwen3.5:0.8b`
Qwen 3.5 2B	Min 2 GB VRAM · Q4_K_M · 256,000 ctx · `ollama run qwen3.5:2b`
Qwen 3.5 4B	Min 3 GB VRAM · Q4_K_M · 256,000 ctx · `ollama run qwen3.5:4b`
Qwen 3.5 9B	Min 6 GB VRAM · Q4_K_M · 256,000 ctx · `ollama run qwen3.5:9b`
Qwen 3.5 27B	Min 17 GB VRAM · Q4_K_M · 262,144 ctx · `ollama run qwen3.5:27b`
Qwen 3.5 35B-A3B	Min 22 GB VRAM · Q4_K_M · 262,144 ctx · `ollama run qwen3.5:35b-a3b`

Recommended GPU

The cheapest GPU that runs Qwen 3.5 locally (min 1 GB VRAM) is the Intel Arc B570 (10 GB).

Ujawnienie afiliacyjne: Niektóre odnośniki na tej stronie to linki afiliacyjne — jeśli dokonasz zakupu za ich pośrednictwem, LLM Configurator może otrzymać prowizję bez dodatkowych kosztów dla Ciebie. Jako uczestnik programu Amazon Associates, LLM Configurator zarabia na kwalifikujących się zakupach.

Intel Arc B570 10GB

Sugerowana cena premierowa: $219

Ceny w 2026 są niestabilne — sprawdź aktualną ofertę.

Sprawdź cenę na Amazon

How to Run Locally

Install Ollama then run: ollama run qwen3.5:0.8b

Minimum VRAM: 1 GB. For best results use Q4_K_M quantization.

Qwen 3.5 — Frequently Asked Questions

How much VRAM does Qwen 3.5 need?

Qwen 3.5 needs about 1 GB VRAM at Q4_K_M quantization for its smallest variant. Variants: Qwen 3.5 0.8B (1 GB, Q4_K_M); Qwen 3.5 2B (2 GB, Q4_K_M); Qwen 3.5 4B (3 GB, Q4_K_M); Qwen 3.5 9B (6 GB, Q4_K_M). On Apple Silicon, unified memory counts toward this requirement.

Can I run Qwen 3.5 on an RTX 4090 (24 GB)?

Yes — Qwen 3.5 runs on an RTX 4090 (24 GB) and other 24 GB cards such as the RTX 3090. Smaller variants also fit comfortably on 8–16 GB GPUs at Q4_K_M.

What quantization should I use for Qwen 3.5?

Q4_K_M is the best balance of quality and VRAM for Qwen 3.5 in most cases. Choose Q8_0 for near-lossless quality if you have spare VRAM, or smaller quants (Q3/Q2) only when memory is tight.

How do I run Qwen 3.5 with Ollama?

Install Ollama, then run: ollama run qwen3.5:0.8b. This downloads Qwen 3.5 and starts a local, OpenAI-compatible endpoint — no internet connection is needed after the initial download.