Faster HuggingFace model downloads — practical playbook
A 70B checkpoint is 140 GB. Direct from huggingface.co takes 4+ hours; over hf-mirror at line-rate 1 GbE it's 22 minutes. A working set of options for 2026.
Reality check
huggingface.co is unreliable from inside China; gated models additionally require huggingface-cli login, and resumed downloads frequently break pipe. Below is the set of paths we actually use on customer sites, in recommended order.
Option 1 — hf-mirror.com (first pick)
Community-maintained reverse proxy, stable since 2024. One env var:
export HF_ENDPOINT=https://hf-mirror.comAfter that, every huggingface_hub / transformers / datasets / peft call routes through the mirror — no code changes:
from transformers import AutoModelForCausalLM
m = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-72B-Instruct") # via hf-mirrorCLI is the same:
HF_ENDPOINT=https://hf-mirror.com \
huggingface-cli download Qwen/Qwen3-72B-Instruct \
--local-dir ./qwen3-72b --local-dir-use-symlinks FalseAdd the export to ~/.bashrc and forget about it.
Option 2 — ModelScope (Alibaba)
Native China CDN, even more reliable than hf-mirror — but does not mirror every HF model. Mainstream open models (Qwen, Llama, DeepSeek, Mistral, Gemma) are there; long-tail ones may not be.
pip install modelscope
modelscope download --model Qwen/Qwen3-72B-Instruct \
--local_dir ./qwen3-72bThe Python API is HF-compatible:
from modelscope import AutoModelForCausalLM
m = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-72B-Instruct")Option 3 — hfd script + aria2 multi-threaded
Community hfd.sh wraps aria2c for parallel chunked download. Great fit for 1 GbE+ links:
wget https://hf-mirror.com/hfd/hfd.sh && chmod +x hfd.sh
./hfd.sh Qwen/Qwen3-72B-Instruct --tool aria2c -x 16 \
--local-dir ./qwen3-72b-x 16 is aria2c's connection count — 16 saturates 1 GbE; 32–64 for 10 GbE.
Option 4 — gated models
Some models (Llama family, Gemma) require accepting a license on the HF web UI. Flow:
- In a browser, open
https://huggingface.co/<model>and click "Agree and access repository". - Mint a read token in HF settings.
- On the node:
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxx
export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download meta-llama/Llama-3.3-70B-Instruct \
--token $HF_TOKEN --local-dir ./llama3.3-70bhf-mirror passes the token through; the license check still happens upstream.
Option 5 — internal cache (recommended for any team)
Pulling from public infra every time is wasteful. Run a Cloudflare R2 / MinIO / Aliyun OSS bucket as your model registry, push once after the first download:
# First time: upload
aws s3 sync ./qwen3-72b/ s3://my-models/qwen3-72b/ \
--endpoint https://my-r2.account.r2.cloudflarestorage.com
# Everyone after: intra-VPC 1–10 GbE, minutes
aws s3 sync s3://my-models/qwen3-72b/ ./qwen3-72b/ \
--endpoint https://my-r2.account.r2.cloudflarestorage.comAlaya customers can use the built-in OSS — CCI/CCS nodes get 1 GbE intranet to OSS, 70B in roughly 3 minutes.
Measured
70B model, 140 GB, Hangzhou office 1 GbE + node 1 GbE intranet:
| Path | Time | Success rate |
|---|---|---|
huggingface.co direct | 4h+, multiple drops | < 30% |
hf-mirror.com (single connection) | 38 min | 100% |
hfd.sh + aria2c -x 16 | 22 min | 100% |
| ModelScope | 26 min | 100% (when model is mirrored) |
| Internal OSS (cache hit) | 3 min | 100% |
Gotchas
- Older
transformers(< 4.40) sometimes ignoresHF_ENDPOINT— also setHUGGINGFACE_HUB_ENDPOINT. - Always pass
local-dir-use-symlinks=False. Otherwise the local-dir holds symlinks into the cache, and deleting the cache breaks them. - Watch progress with
du -sh ./qwen3-72b/instead of the CLI bar — some versions wedge the bar but keep downloading.
Last updated on
NCCL-tests in practice — the mandatory pre-launch health check
A new cluster comes up. Don't run training yet. One sweep of NCCL-tests tells you whether it can host LLMs at all — here is how to read the busbw.
GPU Monitoring & XID triage — six tables for on-call
What to read in nvidia-smi vs DCGM, what XID 79 / 31 / 119 actually mean, and when to RMA. A pocket reference for SREs and escalation engineers.
