Accelerate model downloads (Hugging Face mirror)

Set HF_ENDPOINT inside a pod to use the internal Beijing-1/2/3/4 mirror — three approaches with snapshot_download, huggingface-cli, and from_pretrained

Across different intelligent computing centers, configuring a Hugging Face mirror endpoint can significantly speed up model downloads. This page lists the HF_ENDPOINT environment variable per region and three ways to download models.

Set the `HF_ENDPOINT` environment variable

Inside the pod, pick the mirror endpoint that matches your intelligent computing center. The mirror caches a large catalog of popular open-source models.

Intelligent computing center	Linux (`bash`)	Windows (`PowerShell`)
Beijing-1	`export HF_ENDPOINT=http://hfmirror.mas.zetyun.cn:8082`	`$env:HF_ENDPOINT = "http://hfmirror.mas.zetyun.cn:8082"`
Beijing-2	`export HF_ENDPOINT=http://hfmirrora01.hd-02.zetyun.cn:8082`	`$env:HF_ENDPOINT = "http://hfmirrora01.hd-02.zetyun.cn:8082"`
Beijing-3	`export HF_ENDPOINT=http://hfmirror-1.hd-03.zetyun.cn:8082`	`$env:HF_ENDPOINT = "http://hfmirror-1.hd-03.zetyun.cn:8082"`
Beijing-4	`export HF_ENDPOINT=http://hfmirror.xn-01.zetyun.cn:8082`	`$env:HF_ENDPOINT = "http://hfmirror.xn-01.zetyun.cn:8082"`

List cached models

Use curl to view the cached model catalog for each region:

# Beijing-1
curl http://hfmirror.mas.zetyun.cn:8082/repos
# Beijing-2
curl http://hfmirrora01.hd-02.zetyun.cn:8082/repos
# Beijing-3
curl http://hfmirror-1.hd-03.zetyun.cn:8082/repos
# Beijing-4
curl http://hfmirror.xn-01.zetyun.cn:8082/repos

If a model you need is not in the cached list, request that it be added through the Alaya NeW website's online inquiry.

Download models

First install the latest huggingface_hub:

pip install -U huggingface_hub

Method A: `snapshot_download`

from huggingface_hub import snapshot_download
import os

os.environ["HF_ENDPOINT"] = "http://hfmirror-1.hd-03.zetyun.cn:8082"

snapshot_download(
    repo_id='Qwen/Qwen-7B',
    repo_type='model',
    local_dir='./model_dir',
    resume_download=True,
    max_workers=8,
)

Method B: `huggingface-cli`

huggingface-cli download --resume-download Qwen/Qwen2.5-1.5B-Instruct \
  --local-dir Qwen/Qwen2.5-1.5B-Instruct

huggingface-cli download --resume-download Qwen/Qwen2.5-14B-Instruct \
  --local-dir Qwen/Qwen2.5-14B-Instruct

Method C: automatic download via example code

When using example code, point the model cache path at a persistent PVC path:

export HF_HOME=/mnt/models

Then from_pretrained will pull the model automatically:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-1.5B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    proxies={"http://": "hfmirror-1.hd-03.zetyun.cn:8082"},
)
tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    proxies={"http://": "hfmirror-1.hd-03.zetyun.cn:8082"},
)

prompt = "write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
generated_ids = [
    output_ids[len(input_ids):]
    for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Accelerate model downloads (Hugging Face mirror)

Set the HF_ENDPOINT environment variable

List cached models

Download models

Method A: snapshot_download

Method B: huggingface-cli

Method C: automatic download via example code

On this page

Set the `HF_ENDPOINT` environment variable

Method A: `snapshot_download`

Method B: `huggingface-cli`