部署模型（Expert）

部署基模型接口。

POST

https://api.alayanew.com/api/serverless-infer/v1/deployment/expert

Authorizations

Authorizations：StringHeaderRequired

用户可通过已获取Serverless API Key做验证，例如：plain Credential=[YOUR_AK],Signature=[YOUR_SK]。

Body

application/json

vksId：StringRequired

弹性容器集群（VKS）ID。

namespace：StringRequired

弹性容器集群（VKS）NameSpace。

name：StringRequired

服务名称。

servedName：List<String>Required

模型内部标识。

modelId：StringRequired

模型ID。

headConfig：ObjectRequired

headConfig.image：IntRequired

headConfig.imagePullSecret：Object

headConfig. labels：Object

headConfig.env：Object

headConfig.cmd：Array[String]Required

headConfig. args：Array[String]

headConfig.resource：ObjectRequired

headConfig. pvcMounts：Array

workerConfig：Object

workerConfig.image：IntRequired

workerConfig.imagePullSecret：Object

workerConfig. labels：Object

workerConfig.env：Object

workerConfig.cmd：Array[String]Required

workerConfig. args：Array[String]

workerConfig.resource：ObjectRequired

workerConfig.workers：IntRequired

scale：ObjectRequired

scale.max：IntRequired

scale.min：IntRequired

scale.rpsValue：Int

scale.inFlightValue：Int

scale. idleTime：Int

extension：ObjectRequired

扩展字段。

Response

状态码：

200

application/json

code：Int

code是一种常见的返回值形式，表示查询操作的执行结果。

-1

0是成功标识，表示操作成功完成。

data：Object

data. serviceId：String

msg：String

code返回值为-1时，返回异常信息。

cURL

Python

JavaScript

Java

curl --location --request POST 'https://api.alayanew.com/api/serverless-infer/v1/deployment/expert' 
     --header 'Authorization:plain Credential=YOUR_AK,Signature=YOUR_SK' 
     --header 'Content-Type: application/json' 
     --data ' 
        {
        "name": "test-expert",
        "namespace": "default",
        "vksId": "vcacb50arkk4",
        "servedName": ["testvllm"],
        "modelId": "c486cdee-c316-4fc1-9f75-0d1741940f27",
        "scale": {
            "max": 3,
            "min": 1,
            "rpsValue": 10,
            "idleTime": 60
        },
        "headConfig": {
            "image": "registry.cn-hangzhou.aliyuncs.com/ls-2018/test:vllm-0.8.1p",
            "cmd": ["sh", "-c", "test.sh"],
            "labels": {
            "usage": "test"
            },
            "env": {
            "VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true"
            },
            "args": ["-Xmx", "52m"],
            "resource": {
            "workers": 3,
            "mem": 32,
            "cpu": 4,
            "gpu": {
                "gpuType": "vidia.com/gpu-l40s",
                "count": 1
            }
            },
            "pvcMounts": [
            {
                "containerPath": "/scripts",
                "pvcName": "test-name"
            }
            ]
        },
        "workerConfig": {
            "workers": 3,
            "image": "registry.cn-hangzhou.aliyuncs.com/ls-2018/test:vllm-0.8.1p",
            "cmd": ["sh", "-c", "test.sh"],
            "labels": {
            "usage": "test"
            },
            "env": {
            "VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true"
            },
            "args": [""],
            "resource": {
            "workers": 3,
            "mem": 8,
            "cpu": 4,
            "gpu": {
                "gpuType": "vidia.com/gpu-l40s",
                "count": 1
            }
            },
            "pvcMounts": [
            {
                "containerPath": "/scripts",
                "pvcName": "test-name"
            }
            ]
        },
        "extensions": {
            "usage": "test"
        }
        }'

200

400

401

403

404

500

{
  "code": "0",
  "data": {
    "serviceId":""
  },
  "msg": "optional,string,"
}