九章智算云

修改训练任务模版

按模版 ID 全量更新一个已有的训练任务模版,可调整资源规格、镜像、存储挂载、环境变量、启动命令等配置。模版 ID 可从训练任务模版列表获取,请求体字段与创建训练任务模版一致。

PUThttps://api.alayanew.com/v1/training/template/{id}/update

鉴权(Authorizations)

AuthorizationString必填

用户可通过已获取的 Open API Key 做验证。例如:Bearer [YOUR_API_KEY]

Path Parameters

idString必填

待修改的训练任务模版 ID(来自模版列表id)。例如:tpl_6f1a62c0a5bf4e0c9b2f

Request Body

nameString

模版名称,用于在模版列表中区分识别。例如:pytorch-2gpu-template

descString

模版描述,补充说明该模版的适用场景。例如:常用2卡训练模板

aidcIdInteger

智算中心 ID,指定模版默认绑定的智算中心。例如:2

trainingTypeString

训练类型,可选值:PRE_TRAINING 预训练、HPC 高性能计算。例如:PRE_TRAINING

trainingFramworkString

训练框架,可选值:PyTorchDeepSpeedMPITensorFlow。例如:TensorFlow

imageTypeString

镜像类型,可选值:general 基础镜像、application 应用镜像、private 私有镜像。需与 image 字段对应。例如:general

imageString

容器镜像地址(用户选择的容器镜像)。例如:harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter

envObject

环境变量(键值对)。

显示 properties
emptyBoolean
Additional propertiesObject

额外属性,例如:"ENV_MODE":"production","MAX_CONN":"200"

resourceObject

资源配置。

显示 properties
emptyBoolean
Additional propertiesObject

额外属性,例如:"type":"worker","gpuName":"NVIDIA-L40S-PCIE-48G","cpuCores":"1","gpuCount":"1","memoryGB":"2","productCode":"PRD-QTT","productName":"量子训练","workerCount":1,"productPrice":"0.0002B"。

enableAutoRetryBoolean

是否开启失败自动重试。例如:true

storageConfigsArray<Object>

存储配置参数。

显示 properties
emptyBoolean
ItemsObject

例如:"storageId":"0000-0000-0000-0000","storageType":"nas-capacity","fileDirectory":"nas123","mountPath":"/root/nas/123","onlyRead":true。

maxRetryCountInteger

最大重试次数,例如: 3

enableTimeoutCancelBoolean

是否支持超时取消,例如: true

timeoutHoursInteger

训练任务超时时间(单位:小时),例如: 1

startCommandString

启动命令,例如: "python train.py --data /root/nas/123/data"

priorityInteger

优先级,例如:3

Response

statusInteger

业务状态码,200 表示成功。

messageString

接口响应信息。例如:"OK"

dataObject

修改结果。修改成功时不返回额外业务数据。

curl -X 'PUT' \
  'https://api.alayanew.com/v1/training/template/08ba7a0e-af00-4072-a5e5-1298ed6c1aa0/update' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer [YOUR_API_KEY]' \
  -d '{
    "name": "my training task",
    "desc": "my training task",
    "aidcId": 2,
    "trainingType": "PRE_TRAINING",
    "trainingFramwork": "TensorFlow",
    "imageType": "general",
    "image": "harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter",
    "env": {
      "ENV_MODE": "production",
      "MAX_CONN": "200"
    },
    "resource": {
      "type": "worker",
      "gpuName": "NVIDIA-L40S-PCIE-48G",
      "cpuCores": "1",
      "gpuCount": "1",
      "memoryGB": "2",
      "productCode": "PRD-QTT",
      "productName": "量子训练",
      "workerCount": 1,
      "productPrice": "0.0002B"
    },
    "enableAutoRetry": true,
    "storageConfigs": [
      {
        "storageId": "0000-0000-0000-0000",
        "storageType": "nas-capacity",
        "fileDirectory": "nas123",
        "mountPath": "/root/nas/123",
        "onlyRead": true
      }
    ],
    "maxRetryCount": 3,
    "enableTimeoutCancel": true,
    "timeoutHours": 1,
    "startCommand": "python train.py --data /root/nas/123/data",
    "priority": 3
  }'
import requests

template_id = "08ba7a0e-af00-4072-a5e5-1298ed6c1aa0"
url = f"https://api.alayanew.com/v1/training/template/{template_id}/update"
headers = {
    "accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": "Bearer [YOUR_API_KEY]"
}
payload = {
    "name": "my training task",
    "desc": "my training task",
    "aidcId": 2,
    "trainingType": "PRE_TRAINING",
    "trainingFramwork": "TensorFlow",
    "imageType": "general",
    "image": "harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter",
    "env": {
        "ENV_MODE": "production",
        "MAX_CONN": "200"
    },
    "resource": {
        "type": "worker",
        "gpuName": "NVIDIA-L40S-PCIE-48G",
        "cpuCores": "1",
        "gpuCount": "1",
        "memoryGB": "2",
        "productCode": "PRD-QTT",
        "productName": "量子训练",
        "workerCount": 1,
        "productPrice": "0.0002B"
    },
    "enableAutoRetry": True,
    "storageConfigs": [
        {
            "storageId": "0000-0000-0000-0000",
            "storageType": "nas-capacity",
            "fileDirectory": "nas123",
            "mountPath": "/root/nas/123",
            "onlyRead": True
        }
    ],
    "maxRetryCount": 3,
    "enableTimeoutCancel": True,
    "timeoutHours": 1,
    "startCommand": "python train.py --data /root/nas/123/data",
    "priority": 3
}

response = requests.put(url, headers=headers, json=payload)
response.raise_for_status()
print(response.json())
const templateId = '08ba7a0e-af00-4072-a5e5-1298ed6c1aa0';
const payload = {
  name: 'my training task',
  desc: 'my training task',
  aidcId: 2,
  trainingType: 'PRE_TRAINING',
  trainingFramwork: 'TensorFlow',
  imageType: 'general',
  image: 'harbor.zetyun.cn/anc-public/general/tensorflow:2.16.1-gpu-jupyter',
  env: { ENV_MODE: 'production', MAX_CONN: '200' },
  resource: {
    type: 'worker',
    gpuName: 'NVIDIA-L40S-PCIE-48G',
    cpuCores: '1',
    gpuCount: '1',
    memoryGB: '2',
    productCode: 'PRD-QTT',
    productName: '量子训练',
    workerCount: 1,
    productPrice: '0.0002B'
  },
  enableAutoRetry: true,
  storageConfigs: [
    {
      storageId: '0000-0000-0000-0000',
      storageType: 'nas-capacity',
      fileDirectory: 'nas123',
      mountPath: '/root/nas/123',
      onlyRead: true
    }
  ],
  maxRetryCount: 3,
  enableTimeoutCancel: true,
  timeoutHours: 1,
  startCommand: 'python train.py --data /root/nas/123/data',
  priority: 3
};

fetch(`https://api.alayanew.com/v1/training/template/${templateId}/update`, {
  method: 'PUT',
  headers: {
    'accept': 'application/json',
    'Content-Type': 'application/json',
    'Authorization': 'Bearer [YOUR_API_KEY]'
  },
  body: JSON.stringify(payload)
})
  .then(res => {
    if (!res.ok) {
      throw new Error(`HTTP error! status: ${res.status}`);
    }
    return res.json();
  })
  .then(console.log)
  .catch(console.error);
{
  "status": 200,
  "message": "OK",
  "data": {}
}
{
  "status": 403,
  "message": "Forbidden",
  "data": {}
}
{
  "status": 500,
  "message": "Internal Server Error",
  "data": {}
}

最后更新于