Alaya NeW Cloud

查询 Pod 事件列表

查询分布式训练任务中指定 Pod 的 Kubernetes 事件列表,用于追踪该节点从调度、拉取镜像、启动容器到运行过程中的关键状态变化与告警,是定位 Pod 启动失败、调度阻塞等问题的首选入口。Pod 名称可从任务详情获取。

GEThttps://api.alayanew.com/v1/training/instance/{id}/{podName}/events

鉴权(Authorizations)

AuthorizationString必填

用户可通过已获取的 Open API Key 做验证。例如:Bearer [YOUR_API_KEY]

Path Parameters

idString必填

训练任务 ID(来自任务列表id)。例如:ca78d6b9-e196-5a0f-b1be-ab036b3cb91a

podNameString必填

Pod 名称。多机多卡任务下每个节点对应一个 Pod,名称可从任务详情获取。例如:worker-0

Response

statusInteger

业务状态码,200 表示成功。

messageString

接口响应信息,成功或失败原因描述。例如:"OK"

dataArray

该 Pod 的事件列表,按事件发生时间排列;每个元素为一条容器事件。

显示 properties
eventTypeString

事件类型,取值:Normal 正常事件、Warning 告警事件。排查异常时优先关注 Warning。例如:Normal

firstTimestampString

事件首次发生时间(日期时间)。例如:"2025-01-01 00:00:00"

messageString

事件详细消息,描述具体发生了什么。例如:"Started container pytorch"

reasonString

事件原因(简短分类标识)。例如:"Scheduled""Pulled""Started""FailedScheduling"

eventSourceString

事件来源组件。例如:"kubelet""default-scheduler"

curl -X 'GET' \
  'https://api.alayanew.com/v1/training/instance/ca78d6b9-e196-5a0f-b1be-ab036b3cb91a/worker-0/events' \
  -H 'accept: application/json' \
  -H 'Authorization: Bearer [YOUR_API_KEY]'
import requests

task_id = "ca78d6b9-e196-5a0f-b1be-ab036b3cb91a"
pod_name = "worker-0"
url = f"https://api.alayanew.com/v1/training/instance/{task_id}/{pod_name}/events"
headers = {
    "accept": "application/json",
    "Authorization": "Bearer [YOUR_API_KEY]"
}

response = requests.get(url, headers=headers)
print(response.json())
const taskId = 'ca78d6b9-e196-5a0f-b1be-ab036b3cb91a';
const podName = 'worker-0';

fetch(`https://api.alayanew.com/v1/training/instance/${taskId}/${podName}/events`, {
  method: 'GET',
  headers: {
    'accept': 'application/json',
    'Authorization': 'Bearer [YOUR_API_KEY]'
  }
})
  .then(res => res.json())
  .then(console.log)
  .catch(console.error);
{
  "status": 200,
  "message": "OK",
  "data": [
    {
      "eventType": "Normal",
      "firstTimestamp": "2025-01-01 00:00:00",
      "message": "Started container pytorch",
      "reason": "Scheduled",
      "eventSource": "kubelet"
    }
  ]
}
{
  "status": 403,
  "message": "Forbidden",
  "data": {}
}
{
  "status": 500,
  "message": "Internal Server Error",
  "data": {}
}

Last updated on