Offline Deployment(Expert)
Users can update expert deployment model information offline after stopping the instance.
PUT
https://api.alayanew.com/api/serverless-infer/v1/deployment/expert/{serviceId}
Authorizations
Authorizations:StringHeaderRequired
用户可通过已获取Open API Key做验证,例如:plain Credential=[YOUR_AK],Signature=[YOUR_SK]。
Path Parameters
serviceId:StringRequired
Service ID.
Body
application/json
vksId:StringRequired
Vital Kubernetes Engine (VKS) ID.
namespace:StringRequired
Vital Kubernetes Engine (VKS) Namespace.
name:StringRequired
Service Name.
servedName:List<String>Required
Internal model identifier.
modelId:String
Model ID.
headConfig:ObjectRequired
workerConfig:Object
scale:ObjectRequired
Response
状态码:application/json
200
code:Int
code is a common return value form representing the execution result of the query operation.
0
-1
0 is the success flag, indicating the operation completed successfully.
data:Object
msg:StringRequired
Returns an error message when the code value is -1.
cURL
Python
JavaScript
Go
Java
curl --location --request PUT 'https://api.alayanew.com/api/serverless-infer/v1/deployment/expert/38fbfc3d-6a88-4c35-b8b6-9efc83949d47'
--header 'Authorization:plain Credential=YOUR_AK,Signature=YOUR_SK'
--header 'Content-Type: application/json'
--data '{
"name":"test-expert",
"namespace": "default",
"vksId": "vcacb50arkk4",
"servedName": ["testsglang"],
"modelId": "c486cdee-c316-4fc1-9f75-0d1741940f27",
"scale": {
"max": 3,
"min": 1,
"rpsValue": 10,
"idleTime": 60
},
"headConfig": {
"image": "registry.cn-hangzhou.aliyuncs.com/ls-2018/sglang:0.4.6",
"cmd": ["sh", "-c", "test.sh"],
"labels": {
"usage": "test"
},
"env": {
"VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true"
},
"args": [],
"resource": {
"workers": 3,
"mem": 32,
"cpu": 4,
"gpu": {
"gpuType": "vidia.com/gpu-l40s",
"count": 1
}
},
"pvcMounts":[
{
"containerPath": "/scripts",
"pvcName": "test-name"
}
]
},
"workerConfig": {
"workers": 3,
"image": "registry.cn-hangzhou.aliyuncs.com/ls-2018/sglang:0.4.6",
"cmd": ["sh", "-c", "test.sh"],
"labels": {
"usage": "test"
},
"env": {
"VLLM_ALLOW_RUNTIME_LORA_UPDATING": "true"
},
"args": [""],
"resource": {
"workers": 3,
"mem": 8,
"cpu": 4,
"gpu": {
"gpuType": "vidia.com/gpu-l40s",
"count": 1
}
},
"pvcMounts":[
{
"containerPath": "/scripts",
"pvcName": "test-name"
}
]
},
"extensions": {
"usage": "test"
}
}'
200
400
401
403
404
500
{
"code": 0",
"data": {
},
"msg": "string, "
}