模型下载加速(Hugging Face 镜像)

在不同智算中心,设置 Hugging Face 镜像站点可显著加速模型下载。本页给出按区域配置 HF_ENDPOINT 的环境变量与三种下载方式。

注意

智算中心地址可能随智算中心建设发生变化，请以实际获取的智算中心信息为准。

设置 `HF_ENDPOINT` 环境变量

根据所在智算中心选择对应镜像站点。镜像站缓存了主流开源模型。

智算中心	Linux (`bash`)	Windows (`PowerShell`)
北京一区	`export HF_ENDPOINT=http://hfmirror.mas.zetyun.cn:8082`	`$env:HF_ENDPOINT = "http://hfmirror.mas.zetyun.cn:8082"`
北京二区	`export HF_ENDPOINT=http://hfmirrora01.hd-02.zetyun.cn:8082`	`$env:HF_ENDPOINT = "http://hfmirrora01.hd-02.zetyun.cn:8082"`
北京三区	`export HF_ENDPOINT=http://hfmirror-1.hd-03.zetyun.cn:8082`	`$env:HF_ENDPOINT = "http://hfmirror-1.hd-03.zetyun.cn:8082"`

查看缓存模型列表

通过 curl 查看各区域已缓存的模型清单:

# 北京一区
curl http://hfmirror.mas.zetyun.cn:8082/repos
# 北京二区
curl http://hfmirrora01.hd-02.zetyun.cn:8082/repos
# 北京三区
curl http://hfmirror-1.hd-03.zetyun.cn:8082/repos

说明

如果模型列表中没有所需模型,可通过官网在线咨询申请加入缓存。

下载模型

先安装最新的 huggingface_hub:

pip install -U huggingface_hub

snapshot_download下载模型：

from huggingface_hub import snapshot_download
import os

os.environ["HF_ENDPOINT"] = "http://hfmirror-1.hd-03.zetyun.cn:8082"

snapshot_download(repo_id='Qwen/Qwen-7B', repo_type='model',
                  local_dir='./model_dir', resume_download=True,
                  max_workers=8)

huggingface cli下载模型.

下载Qwen2.5-1.5B-Instruct（Windows系统）:

hf download --% Qwen/Qwen2.5-1.5B-Instruct --local-dir Qwen/Qwen2.5-1.5B-Instruct
hf download --% Qwen/Qwen2.5-14B-Instruct --local-dir Qwen/Qwen2.5-14B-Instruct

huggingface示例代码下载模型：

使用示例代码时需要修改Cache模型的路径，将其设置为持久化的pvc路径。例如：

export HF_HOME=/mnt/models

可使用如下代码下载值直接运行，并自动下载模型

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-1.5B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
    proxies ={"http://": "hfmirror-1.hd-03.zetyun.cn:8082"}
)
tokenizer = AutoTokenizer.from_pretrained(model_name,
    proxies ={"http://": "hfmirror-1.hd-03.zetyun.cn:8082"}
                                          )

prompt = "write a quick sort algorithm."
messages = [
    {"role": "system", "content": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]