部署模型（QuickStart）

部署基模型接口。

POST

https://api.alayanew.com/api/serverless-infer/v1/deployment

Authorizations

Authorizations：StringHeaderRequired

用户可通过已获取Serverless API Key做验证，例如：plain Credential=[YOUR_AK],Signature=[YOUR_SK]。

Body

application/json

vksId：StringRequired

弹性容器集群（VKS）ID。

namespace：StringRequired

弹性容器集群（VKS）NameSpace。

name：StringRequired

服务名称。

servedName：List<String>Required

模型内部标识。

modelId：StringRequired

模型ID。

backend：StringRequired

后端服务, vllm/sglang。

backendVersion：StringRequired

后端服务版本。

backendArgs：Array[String]Required

后端服务参数。

resource：ObjectRequired

resource.workers：IntRequired

resource.cpu：IntRequired

resource.mem：IntRequired

resource.gpu：Object

Response

状态码：

200

application/json

code：Int

code是一种常见的返回值形式，表示查询操作的执行结果。

-1

0是成功标识，表示操作成功完成。

data：Object

data. serviceId：String

msg：String

code返回值为-1时，返回异常信息。

cURL

Python

JavaScript

Java

curl --location --request POST 'https://api.alayanew.com/api/serverless-infer/v1/deployment' 
     --header 'Authorization:plain Credential=YOUR_AK,Signature=YOUR_SK' 
     --header 'Content-Type: application/json' 
     --data '
      {
        "vksId": "vcacb50arkk4",
        "namespace": "default",
        "name": "testvllm",
        "servedName": [
            "testvllm"
        ],
        "modelId": "c486cdee-c316-4fc1-9f75-0d1741940f27",
        "backend": "vllm",
        "backendVersion": "0.9.0.1",
        "backendArgs": [],
        "resource": {
            "workers": 2,
            "cpu": 4,
            "gpu": {
            "count": 1,
            "gpuType": "nvidia.com/gpu-l40s"
            },
            "mem": 10
        }
        } '

200

400

401

403

404

500

{
  "code": "required,int, 状态码（0 表示成功，其他表示失败）",
  "data": {
    "serviceId":""
  },
  "msg": "optional,string,"
}

后端类型不同时，您需关注对应的参数配置，详情如下所示。

vLLM
SGLang

版本号为0.9.0.1，启动时说明如下所示。

启动时自动指定参数：--enable-lora。
启动时自动配置环境变量："VLLM_ALLOW_RUNTIME_LORA_UPDATING" : true。
启动时默认端口号为:3000，--port 30000。
当用户的backend为vllm、backendVersion为0.9.0.1，且没有指定backendArgs信息时，启动时自动指定参数:--max-loras 16 --max-lora-rank 16 5.scale默认参数如下所示。

{
                "max": 5,
                "min": 1,
                "idleTime": 60,
                "rpsValue": 100,
                "inFlightValue": null
}

版本号为0.4.6，启动时说明如下所示。

scale默认参数如下所示。

{
                "max": 5,
                "min": 1,
                "idleTime": 60,
                "rpsValue": 100,
                "inFlightValue": null
}

当用户的backend为sglang、backendVersion为0.4.6，且没有指定backendArgs信息时，启动时自动指定参数（gpuCount = resource.workers * resource.gpu.count , 当GPU信息不存在时则没有这个参数）：--tp ${gpuCount}。