Skip to main content

Deploy Service(QuickStart)

更新时间:2025-07-02 17:32:25
Deploy base model interface.
POST
https://api.alayanew.com/api/serverless-infer/v1/deployment
Authorizations
AuthorizationsStringHeaderRequired

用户可通过已获取Open API Key做验证,例如:plain Credential=[YOUR_AK],Signature=[YOUR_SK]。

Body
application/json
vksIdStringRequired

Vital Kubernetes Engine (VKS) Cluster ID.

namespaceStringRequired

Vital Kubernetes Engine (VKS) Namespace.

nameStringRequired

Service name.

servedNameList<String>Required

Internal model identifier.

modelIdStringRequired

Model ID.

backendStringRequired

Backend service, e.g., vllm/sglang.

backendVersionStringRequired

Backend service version.

backendArgsArray[String]Required

Backend service arguments.

resourceObjectRequired

Response
状态码:application/json
codeInt

Code is a common return value representing the execution result of the query operation.

0 is the success identifier, indicating the operation completed successfully.
dataObject

msgStringRequired

Returns error information when the code is -1.

curl --location --request POST 'https://api.alayanew.com/api/serverless-infer/v1/deployment'      --header 'Authorization:plain Credential=YOUR_AK,Signature=YOUR_SK'      --header 'Content-Type: application/json'      --data '
      {
        "vksId": "vcacb50arkk4",
        "namespace": "default",
        "name": "testvllm",
        "servedName": [
            "testvllm"
        ],
        "modelId": "c486cdee-c316-4fc1-9f75-0d1741940f27",
        "backend": "vllm",
        "backendVersion": "0.9.0.1",
        "backendArgs": [],
        "resource": {
            "workers": 2,
            "cpu": 4,
            "gpu": {
            "count": 1,
            "gpuType": "nvidia.com/gpu-l40s"
            },
            "mem": 10
        }
        } ' 
{
  "code": "required,int, Status code (0 for success, others for failure)",
  "data": {
    "serviceId":""
  },
  "msg": "optional,string,"
}

后端类型不同时,您需关注对应的参数配置,详情如下所示。

版本号为0.9.0.1,启动时说明如下所示。

  1. 启动时自动指定参数 :--enable-lora
  2. 启动时自动配置环境变量:"VLLM_ALLOW_RUNTIME_LORA_UPDATING" : true
  3. 启动时默认端口号为:3000,--port 30000
  4. 当用户的backendvllmbackendVersion0.9.0.1,且没有指定backendArgs信息时,启动时自动指定参数:--max-loras 16 --max-lora-rank 16 5.scale默认参数如下所示。
{
"max": 5,
"min": 1,
"idleTime": 60,
"rpsValue": 100,
"inFlightValue": null
}
tip

QuickStart模型在当前部署模式下不支持将实例数量缩减至零。如需调整资源规模,请保持实例数量不小于1。