九章智算云

查询 Pod YAML

获取分布式训练任务中指定 Pod 的完整 YAML 定义,便于排查该节点的调度参数、资源声明、镜像、存储挂载与环境变量等底层配置是否符合预期。Pod 名称可从任务详情的执行记录中获取。

GEThttps://api.alayanew.com/v1/training/instance/{id}/{podName}/yaml

鉴权(Authorizations)

AuthorizationString必填

用户可通过已获取的 Open API Key 做验证。例如:Bearer [YOUR_API_KEY]

Path Parameters

idString必填

训练任务 ID(来自任务列表id)。例如:ca78d6b9-e196-5a0f-b1be-ab036b3cb91a

podNameString必填

Pod 名称。多机多卡任务下每个节点对应一个 Pod,名称可从任务详情获取。例如:worker-0

Response

statusInteger

业务状态码,200 表示成功。

messageString

接口响应信息,成功或失败原因描述。例如:"OK"

dataString

该 Pod 的完整 YAML 定义,以字符串形式返回,内部为带换行的标准 Kubernetes Pod 清单文本。

curl -X 'GET' \
  'https://api.alayanew.com/v1/training/instance/ca78d6b9-e196-5a0f-b1be-ab036b3cb91a/worker-0/yaml' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer [YOUR_API_KEY]'
import requests

task_id = "ca78d6b9-e196-5a0f-b1be-ab036b3cb91a"
pod_name = "worker-0"
url = f"https://api.alayanew.com/v1/training/instance/{task_id}/{pod_name}/yaml"
headers = {
    "accept": "application/json",
    "Authorization": "Bearer [YOUR_API_KEY]"
}

response = requests.get(url, headers=headers)
print(response.json())
const taskId = 'ca78d6b9-e196-5a0f-b1be-ab036b3cb91a';
const podName = 'worker-0';

fetch(`https://api.alayanew.com/v1/training/instance/${taskId}/${podName}/yaml`, {
  method: 'GET',
  headers: {
    'accept': 'application/json',
    'Authorization': 'Bearer [YOUR_API_KEY]'
  }
})
  .then(res => res.json())
  .then(console.log)
  .catch(console.error);
{
  "status": 200,
  "message": "OK",
  "data": "apiVersion: v1\nkind: Pod\nmetadata:\n  name: worker-0\n  namespace: training\nspec:\n  containers:\n    - name: pytorch\n      image: harbor.zetyun.cn/anc-public/general/pytorch:2.3.1-gpu\n      resources:\n        limits:\n          nvidia.com/gpu: \"8\"\n"
}
{
  "status": 403,
  "message": "Forbidden",
  "data": {}
}
{
  "status": 500,
  "message": "Internal Server Error",
  "data": {}
}

最后更新于