Accelerate model downloads (Hugging Face mirror)
Set HF_ENDPOINT inside a pod to use the internal Beijing-1/2/3/4 mirror — three approaches with snapshot_download, huggingface-cli, and from_pretrained
Across different intelligent computing centers, configuring a Hugging Face mirror endpoint can significantly speed up model downloads. This page lists the HF_ENDPOINT environment variable per region and three ways to download models.
Set the HF_ENDPOINT environment variable
Inside the pod, pick the mirror endpoint that matches your intelligent computing center. The mirror caches a large catalog of popular open-source models.
| Intelligent computing center | Linux (bash) | Windows (PowerShell) |
|---|---|---|
| Beijing-1 | export HF_ENDPOINT=http://hfmirror.mas.zetyun.cn:8082 | $env:HF_ENDPOINT = "http://hfmirror.mas.zetyun.cn:8082" |
| Beijing-2 | export HF_ENDPOINT=http://hfmirrora01.hd-02.zetyun.cn:8082 | $env:HF_ENDPOINT = "http://hfmirrora01.hd-02.zetyun.cn:8082" |
| Beijing-3 | export HF_ENDPOINT=http://hfmirror-1.hd-03.zetyun.cn:8082 | $env:HF_ENDPOINT = "http://hfmirror-1.hd-03.zetyun.cn:8082" |
| Beijing-4 | export HF_ENDPOINT=http://hfmirror.xn-01.zetyun.cn:8082 | $env:HF_ENDPOINT = "http://hfmirror.xn-01.zetyun.cn:8082" |
List cached models
Use curl to view the cached model catalog for each region:
# Beijing-1
curl http://hfmirror.mas.zetyun.cn:8082/repos
# Beijing-2
curl http://hfmirrora01.hd-02.zetyun.cn:8082/repos
# Beijing-3
curl http://hfmirror-1.hd-03.zetyun.cn:8082/repos
# Beijing-4
curl http://hfmirror.xn-01.zetyun.cn:8082/reposIf a model you need is not in the cached list, request that it be added through the Alaya NeW website's online inquiry.
Download models
First install the latest huggingface_hub:
pip install -U huggingface_hubMethod A: snapshot_download
from huggingface_hub import snapshot_download
import os
os.environ["HF_ENDPOINT"] = "http://hfmirror-1.hd-03.zetyun.cn:8082"
snapshot_download(
repo_id='Qwen/Qwen-7B',
repo_type='model',
local_dir='./model_dir',
resume_download=True,
max_workers=8,
)Method B: huggingface-cli
huggingface-cli download --resume-download Qwen/Qwen2.5-1.5B-Instruct \
--local-dir Qwen/Qwen2.5-1.5B-Instruct
huggingface-cli download --resume-download Qwen/Qwen2.5-14B-Instruct \
--local-dir Qwen/Qwen2.5-14B-InstructMethod C: automatic download via example code
When using example code, point the model cache path at a persistent PVC path:
export HF_HOME=/mnt/modelsThen from_pretrained will pull the model automatically:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/Qwen2.5-1.5B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
proxies={"http://": "hfmirror-1.hd-03.zetyun.cn:8082"},
)
tokenizer = AutoTokenizer.from_pretrained(
model_name,
proxies={"http://": "hfmirror-1.hd-03.zetyun.cn:8082"},
)
prompt = "write a quick sort algorithm."
messages = [
{"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
{"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [
output_ids[len(input_ids):]
for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]Last updated on
Tmux for terminal multiplexing
Terminal multiplexer essentials — the session / window / pane model, and keeping tasks alive after SSH disconnects
GitHub access acceleration (China)
Community proxies to make GitHub usable inside pods running in Mainland China — git clone / wget / curl / git config / pip install
