Deploy Service（QuickStart）

更新时间：2025-07-02 17:32:25

Deploy base model interface.

POST

https://api.alayanew.com/api/serverless-infer/v1/deployment

Authorizations

Authorizations：StringHeaderRequired

用户可通过已获取Open API Key做验证，例如：plain Credential=[YOUR_AK],Signature=[YOUR_SK]。

Body

application/json

vksId：StringRequired

Vital Kubernetes Engine (VKS) Cluster ID.

namespace：StringRequired

Vital Kubernetes Engine (VKS) Namespace.

name：StringRequired

Service name.

servedName：List<String>Required

Internal model identifier.

modelId：StringRequired

Model ID.

backend：StringRequired

Backend service, e.g., vllm/sglang.

backendVersion：StringRequired

Backend service version.

backendArgs：Array[String]Required

Backend service arguments.

resource：ObjectRequired

resource.workers：IntRequired

resource.cpu：IntRequired

resource.mem：IntRequired

resource.gpu：Object

Response

状态码：

200

application/json

code：Int

Code is a common return value representing the execution result of the query operation.

-1

0 is the success identifier, indicating the operation completed successfully.

data：Object

data. serviceId：String

msg：StringRequired

Returns error information when the code is -1.

cURL

Python

JavaScript

Java

curl --location --request POST 'https://api.alayanew.com/api/serverless-infer/v1/deployment'      --header 'Authorization:plain Credential=YOUR_AK,Signature=YOUR_SK'      --header 'Content-Type: application/json'      --data '
      {
        "vksId": "vcacb50arkk4",
        "namespace": "default",
        "name": "testvllm",
        "servedName": [
            "testvllm"
        ],
        "modelId": "c486cdee-c316-4fc1-9f75-0d1741940f27",
        "backend": "vllm",
        "backendVersion": "0.9.0.1",
        "backendArgs": [],
        "resource": {
            "workers": 2,
            "cpu": 4,
            "gpu": {
            "count": 1,
            "gpuType": "nvidia.com/gpu-l40s"
            },
            "mem": 10
        }
        } '

200

400

401

403

404

500

{
  "code": "required,int, Status code (0 for success, others for failure)",
  "data": {
    "serviceId":""
  },
  "msg": "optional,string,"
}

后端类型不同时，您需关注对应的参数配置，详情如下所示。

vLLM
SGLang

版本号为0.9.0.1，启动时说明如下所示。

启动时自动指定参数：--enable-lora。
启动时自动配置环境变量："VLLM_ALLOW_RUNTIME_LORA_UPDATING" : true。
启动时默认端口号为:3000，--port 30000。
当用户的backend为vllm、backendVersion为0.9.0.1，且没有指定backendArgs信息时，启动时自动指定参数:--max-loras 16 --max-lora-rank 16 5.scale默认参数如下所示。

{
                "max": 5,
                "min": 1,
                "idleTime": 60,
                "rpsValue": 100,
                "inFlightValue": null
}

版本号为0.4.6，启动时说明如下所示。

scale默认参数如下所示。

{
                "max": 5,
                "min": 1,
                "idleTime": 60,
                "rpsValue": 100,
                "inFlightValue": null
}

当用户的backend为sglang、backendVersion为0.4.6，且没有指定backendArgs信息时，启动时自动指定参数（gpuCount = resource.workers * resource.gpu.count , 当GPU信息不存在时则没有这个参数）：--tp ${gpuCount}。

tip

QuickStart模型在当前部署模式下不支持将实例数量缩减至零。如需调整资源规模，请保持实例数量不小于1。