Faster HuggingFace model downloads — practical playbook

A 70B checkpoint is 140 GB. Direct from huggingface.co takes 4+ hours; over hf-mirror at line-rate 1 GbE it's 22 minutes. A working set of options for 2026.

Reality check

huggingface.co is unreliable from inside China; gated models additionally require huggingface-cli login, and resumed downloads frequently break pipe. Below is the set of paths we actually use on customer sites, in recommended order.

Option 1 — hf-mirror.com (first pick)

Community-maintained reverse proxy, stable since 2024. One env var:

export HF_ENDPOINT=https://hf-mirror.com

After that, every huggingface_hub / transformers / datasets / peft call routes through the mirror — no code changes:

from transformers import AutoModelForCausalLM
m = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-72B-Instruct")  # via hf-mirror

CLI is the same:

HF_ENDPOINT=https://hf-mirror.com \
huggingface-cli download Qwen/Qwen3-72B-Instruct \
  --local-dir ./qwen3-72b --local-dir-use-symlinks False

Add the export to ~/.bashrc and forget about it.

Option 2 — ModelScope (Alibaba)

Native China CDN, even more reliable than hf-mirror — but does not mirror every HF model. Mainstream open models (Qwen, Llama, DeepSeek, Mistral, Gemma) are there; long-tail ones may not be.

pip install modelscope
modelscope download --model Qwen/Qwen3-72B-Instruct \
  --local_dir ./qwen3-72b

The Python API is HF-compatible:

from modelscope import AutoModelForCausalLM
m = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-72B-Instruct")

Option 3 — hfd script + aria2 multi-threaded

Community hfd.sh wraps aria2c for parallel chunked download. Great fit for 1 GbE+ links:

wget https://hf-mirror.com/hfd/hfd.sh && chmod +x hfd.sh
./hfd.sh Qwen/Qwen3-72B-Instruct --tool aria2c -x 16 \
  --local-dir ./qwen3-72b

-x 16 is aria2c's connection count — 16 saturates 1 GbE; 32–64 for 10 GbE.

Option 4 — gated models

Some models (Llama family, Gemma) require accepting a license on the HF web UI. Flow:

In a browser, open https://huggingface.co/<model> and click "Agree and access repository".
Mint a read token in HF settings.
On the node:

export HF_TOKEN=hf_xxxxxxxxxxxxxxxxx
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download meta-llama/Llama-3.3-70B-Instruct \
  --token $HF_TOKEN --local-dir ./llama3.3-70b

hf-mirror passes the token through; the license check still happens upstream.

Option 5 — internal cache (recommended for any team)

Pulling from public infra every time is wasteful. Run a Cloudflare R2 / MinIO / Aliyun OSS bucket as your model registry, push once after the first download:

# First time: upload
aws s3 sync ./qwen3-72b/ s3://my-models/qwen3-72b/ \
  --endpoint https://my-r2.account.r2.cloudflarestorage.com

# Everyone after: intra-VPC 1–10 GbE, minutes
aws s3 sync s3://my-models/qwen3-72b/ ./qwen3-72b/ \
  --endpoint https://my-r2.account.r2.cloudflarestorage.com

Alaya customers can use the built-in OSS — CCI/CCS nodes get 1 GbE intranet to OSS, 70B in roughly 3 minutes.

Measured

70B model, 140 GB, Hangzhou office 1 GbE + node 1 GbE intranet:

Path	Time	Success rate
`huggingface.co` direct	4h+, multiple drops	< 30%
`hf-mirror.com` (single connection)	38 min	100%
`hfd.sh` + aria2c -x 16	22 min	100%
ModelScope	26 min	100% (when model is mirrored)
Internal OSS (cache hit)	3 min	100%

Gotchas

Older transformers (< 4.40) sometimes ignores HF_ENDPOINT — also set HUGGINGFACE_HUB_ENDPOINT.
Always pass local-dir-use-symlinks=False. Otherwise the local-dir holds symlinks into the cache, and deleting the cache breaks them.
Watch progress with du -sh ./qwen3-72b/ instead of the CLI bar — some versions wedge the bar but keep downloading.

Faster HuggingFace model downloads — practical playbook

On this page