DeployService(Expert)
Deploy base model interface.
POST
https://api.alayanew.com/api/serverless-infer/v1/deployment/expert
Authorizations
Authorizations:StringHeaderRequired
用户可通过已获取Open API Key做验证,例如:plain Credential=[YOUR_AK],Signature=[YOUR_SK]。
Body
application/json
vksId:StringRequired
Vital Kubernetes Engine (VKS) Cluster ID.
namespace:StringRequired
Vital Kubernetes Engine (VKS) Namespace.
name:StringRequired
Service name.
servedName:List<String>Required
Internal model identifier.
modelId:StringRequired
Model ID.
headConfig:ObjectRequired
workerConfig:Object
scale:ObjectRequired
extensions:ObjectRequired
Extension fields.
Response
状态码:application/json
200
code:Int
code is a common return value format indicating the execution result of the query operation.
0
-1
0 is the success flag, indicating the operation completed successfully.
data:Object
msg:StringRequired
Returns exception information when the code value is -1.
cURL
Python
JavaScript
Go
Java
curl --location --request POST 'https://api.alayanew.com/api/serverless-infer/v1/deployment/expert'
--header 'Authorization:plain Credential=YOUR_AK,Signature=YOUR_SK'
--header 'Content-Type: application/json'
--data '
{
"name": "test-expert",
"namespace": "default",
"vksId": "vcacb50arkk4",
"servedName": ["testvllm"],
"modelId": "c486cdee-c316-4fc1-9f75-0d1741940f27",
"scale": {
"max": 3,
"min": 1,
"rpsValue": 10,
"idleTime": 60
},
"headConfig": {
"image": "registry.cn-hangzhou.aliyuncs.com/ls-2018/test:vllm-0.8.1p",
"cmd": ["sh", "-c", "test.sh"],
"labels": { },
"env": {
"VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true"
},
"args": ["-Xmx", "52m"],
"resource": {
"mem": 32,
"cpu": 4,
"gpu": {
"gpuType": "nvidia.com/gpu-l40s",
"count": 1
}
},
"pvcMounts": [
{
"containerPath": "/scripts",
"pvcName": "test-name"
}
]
},
"workerConfig": {
"workers": 1,
"image": "registry.cn-hangzhou.aliyuncs.com/ls-2018/test:vllm-0.8.1p",
"cmd": ["sh", "-c", "test.sh"],
"labels": { },
"env": {
"VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true"
},
"args": [""],
"resource": {
"workers": 3,
"mem": 8,
"cpu": 4,
"gpu": {
"gpuType": "nvidia.com/gpu-l40s",
"count": 1
}
},
"pvcMounts": [
{
"containerPath": "/scripts",
"pvcName": "test-name"
}
]
},
"extensions": {
"usage": "test"
}
}'
200
400
401
403
404
500
{
"code": "0",
"data": {
"serviceId":""
},
"msg": "optional,string,"
}
tip
在当前部署模式中,Expert模型具备弹性伸缩能力,支持将实例数量动态调整至零。