创建训练任务模版
将一组常用的训练配置(资源规格、镜像、存储挂载、环境变量、启动命令等)保存为模版,后续创建训练任务时可直接复用,避免重复填写。创建成功后可在训练任务模版列表中查看,也可在创建分布式训练任务时以 createType=TEMPLATE 引用。
https://api.alayanew.com/v1/training/template/create鉴权(Authorizations)
bearerAuthAuthorizationString必填用户可通过已获取的 Open API Key 做验证。例如:Bearer [YOUR_API_KEY]。
Request body
application/jsonnameString模版名称,用于在模版列表中区分识别。例如:pytorch-2gpu-template。
descString模版描述,补充说明该模版的适用场景。例如:常用2卡训练模板。
aidcIdInteger智算中心 ID,指定模版默认绑定的智算中心。例如:2。
trainingTypeString训练类型,可选值:PRE_TRAINING 预训练、HPC 高性能计算。例如:PRE_TRAINING。
trainingFramworkString训练框架,可选值:PyTorch、DeepSpeed、MPI、TensorFlow。例如:TensorFlow。
imageTypeString镜像类型,可选值:general 基础镜像、application 应用镜像、private 私有镜像。需与 image 字段对应。例如:general。
imageString容器镜像地址(用户选择的容器镜像)。例如:harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter。
envObject环境变量(键值对)。
显示 properties
emptyBooleanAdditional propertiesObject额外参数,例如:"ENV_MODE":"production","MAX_CONN":"200"。
resourceObject资源配置。
显示 properties
emptyBooleanAdditional propertiesObject额外参数,例如:"type":"worker","gpuName":"NVIDIA-L40S-PCIE-48G","cpuCores":"1","gpuCount":"1","memoryGB":"2","productCode":"PRD-QTT","productName":"量子训练","workerCount":1,"productPrice":"0.0002B"。
enableAutoRetryBoolean是否支持自动重试。
storageConfigsArray<Object>存储配置参数。
显示 properties
emptyBooleanItemsObject例如:"storageId":"0000-0000-0000-0000","storageType":"nas-capacity","fileDirectory":"nas123","mountPath":"/root/nas/123","onlyRead":true。
maxRetryCountInteger最大重试次数,例如:3。
enableTimeoutCancelBoolean是否支持超时取消。例如:true。
timeoutHoursInteger训练任务超时时间(单位:小时),例如:3。
startCommandString启动命令,例如:"python train.py --data /root/nas/123/data"。
priorityInteger优先级,取值范围:1-3,例如:3。
Response
application/json · 200statusInteger业务状态码,200 表示成功。
messageString接口响应信息。例如:"OK"。
dataObject创建结果。创建成功时不返回额外业务数据。
curl -X 'POST' \
'https://api.alayanew.com/v1/training/template/create' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer [YOUR_API_KEY]' \
-d '{
"name": "my training task",
"desc": "my training task",
"aidcId": 2,
"trainingType": "PRE_TRAINING",
"trainingFramwork": "TensorFlow",
"imageType": "general",
"image": "harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter",
"env": {
"ENV_MODE": "production",
"MAX_CONN": "200"
},
"resource": {
"type": "worker",
"gpuName": "NVIDIA-L40S-PCIE-48G",
"cpuCores": "1",
"gpuCount": "1",
"memoryGB": "2",
"productCode": "PRD-QTT",
"productName": "量子训练",
"workerCount": 1,
"productPrice": "0.0002B"
},
"enableAutoRetry": true,
"storageConfigs": [
{
"storageId": "0000-0000-0000-0000",
"storageType": "nas-capacity",
"fileDirectory": "nas123",
"mountPath": "/root/nas/123",
"onlyRead": true
}
],
"maxRetryCount": 3,
"enableTimeoutCancel": true,
"timeoutHours": 1,
"startCommand": "python train.py --data /root/nas/123/data",
"priority": 3
}'import requests
url = "https://api.alayanew.com/v1/training/template/create"
headers = {
"accept": "application/json",
"Content-Type": "application/json",
"Authorization": "Bearer [YOUR_API_KEY]"
}
payload = {
"name": "pytorch-2gpu-template",
"desc": "常用2卡训练模板",
"aidcId": 2,
"trainingType": "PRE_TRAINING",
"trainingFramwork": "TensorFlow",
"imageType": "general",
"image": "harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter",
"env": {"ENV_MODE": "production", "MAX_CONN": "200"},
"resource": {
"type": "worker",
"gpuName": "NVIDIA-L40S-PCIE-48G",
"cpuCores": "1",
"gpuCount": "1",
"memoryGB": "2",
"productCode": "PRD-QTT",
"productName": "量子训练",
"workerCount": 1,
"productPrice": "0.0002B"
},
"storageConfigs": [
{
"storageId": "0000-0000-0000-0000",
"storageType": "nas-capacity",
"fileDirectory": "nas123",
"mountPath": "/root/nas/123",
"onlyRead": True
}
],
"enableAutoRetry": True,
"maxRetryCount": 3,
"enableTimeoutCancel": True,
"timeoutHours": 1,
"startCommand": "python train.py --data /root/nas/123/data",
"priority": 3
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
print(response.json())const payload = {
name: 'pytorch-2gpu-template',
desc: '常用2卡训练模板',
aidcId: 2,
trainingType: 'PRE_TRAINING',
trainingFramwork: 'TensorFlow',
imageType: 'general',
image: 'harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter',
env: { ENV_MODE: 'production', MAX_CONN: '200' },
resource: {
type: 'worker',
gpuName: 'NVIDIA-L40S-PCIE-48G',
cpuCores: '1',
gpuCount: '1',
memoryGB: '2',
productCode: 'PRD-QTT',
productName: '量子训练',
workerCount: 1,
productPrice: '0.0002B'
},
storageConfigs: [
{
storageId: '0000-0000-0000-0000',
storageType: 'nas-capacity',
fileDirectory: 'nas123',
mountPath: '/root/nas/123',
onlyRead: true
}
],
enableAutoRetry: true,
maxRetryCount: 3,
enableTimeoutCancel: true,
timeoutHours: 1,
startCommand: 'python train.py --data /root/nas/123/data',
priority: 3
};
fetch('https://api.alayanew.com/v1/training/template/create', {
method: 'POST',
headers: {
'accept': 'application/json',
'Content-Type': 'application/json',
'Authorization': 'Bearer [YOUR_API_KEY]'
},
body: JSON.stringify(payload)
})
.then(res => {
if (!res.ok) {
throw new Error(`HTTP error! status: ${res.status}`);
}
return res.json();
})
.then(console.log)
.catch(console.error);{
"status": 200,
"message": "OK",
"data": {}
}{
"status": 403,
"message": "Forbidden",
"data": {}
}{
"status": 500,
"message": "Internal Server Error",
"data": {}
}最后更新于
