Alaya NeW Cloud

训练任务容器组列表

查询某个分布式训练任务下所有容器组(Pod)的运行概览,包含 Pod 名称、IP、状态、所在节点及终端连接地址,常用于在任务详情页面逐节点排查多机训练的运行情况。

GEThttps://api.alayanew.com/v1/training/instance/{id}/pods

鉴权(Authorizations)

AuthorizationString必填

用户可通过已获取的 Open API Key 做验证。例如:Bearer [YOUR_API_KEY]

Path Parameters

idString必填

训练任务 ID(来自任务列表id)。例如:ca78d6b9-e196-5a0f-b1be-ab036b3cb91a

Response

statusInteger

业务状态码,200 表示成功。

messageString

接口响应信息。例如:"OK"

dataArray

容器组列表,每个元素对应该任务的一个 Pod(如 master、worker 节点各一个)。

显示 properties
podNameString

容器组名称(Pod 名称),是查询 Pod 基本信息资源信息调度信息时所需的 podName。例如:"tn-exqr5k7lvj7k-master-0"

podIpString

Pod IP,即该容器组在集群网络内分配的地址。例如:"10.233.250.123"

statusString

Pod 运行状态。例如:"running"

reasonString

状态原因,当 Pod 处于异常或等待状态时给出说明,正常运行时通常为空。

nodeNameString

Pod 所调度到的节点名称。例如:"gpu-021"

createdTimeString

创建时间。例如:"2024-01-01 12:00:00"

updatedTimeString

修改时间。例如:"2023-12-10 23:59:59"

podWebSSHString

终端(web shell)连接地址,用于在浏览器中直接进入容器调试;若未开通终端则为空。

curl -X 'GET' \
  'https://api.alayanew.com/v1/training/instance/ca78d6b9-e196-5a0f-b1be-ab036b3cb91a/pods' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer [YOUR_API_KEY]'
import requests

task_id = "ca78d6b9-e196-5a0f-b1be-ab036b3cb91a"
url = f"https://api.alayanew.com/v1/training/instance/{task_id}/pods"
headers = {
    "accept": "application/json",
    "Authorization": "Bearer [YOUR_API_KEY]"
}

response = requests.get(url, headers=headers)
print(response.json())
const taskId = 'ca78d6b9-e196-5a0f-b1be-ab036b3cb91a';

fetch(`https://api.alayanew.com/v1/training/instance/${taskId}/pods`, {
  method: 'GET',
  headers: {
    'accept': 'application/json',
    'Authorization': 'Bearer [YOUR_API_KEY]'
  }
})
  .then(res => res.json())
  .then(console.log)
  .catch(console.error);
{
  "status": 200,
  "message": "OK",
  "data": [
    {
      "podName": "tn-exqr5k7lvj7k-master-0",
      "podIp": "10.233.250.123",
      "status": "running",
      "reason": "string",
      "nodeName": "gpu-021",
      "createdTime": "2024-01-01 12:00:00",
      "updatedTime": "2023-12-10 23:59:59",
      "podWebSSH": "string"
    }
  ]
}
{
  "status": 403,
  "message": "Forbidden",
  "data": {}
}
{
  "status": 500,
  "message": "Internal Server Error",
  "data": {}
}

Last updated on