Alaya NeW Cloud

Self-host DeepSeek-V3.2

Deploy the full-parameter DeepSeek-V3.2-Exp on a Cloud Container Instance with sglang + Chat WebUI

Availability

CCI currently supports DeepSeek-V3.2-Exp self-hosting only in the Beijing-3 region, and requires an 8× H100 / H200 reservation.

Prerequisites

  • Verified account with approved GPU quota
  • Beijing-3 region selected
  • CCI spec: 8× H100 SXM or equivalent
  • NAS storage mounted (for weight sharing)

Deploy

1. Create the CCI instance

Console → Products → Compute → Cloud Container Instance → Create

Pick:

ItemValue
Imagealayanew/deepseek-v32-exp:sglang-latest
GPU8× H100 SXM
NetworkDefault VPC + 10 Mbps public bandwidth
StorageNAS mounted at /root/public

2. Start inference

Inside the container:

chmod +x /start-llm-inference.sh
sh -c /start-llm-inference.sh

The container starts sglang with these env vars:

SGLANG_SERVER_HOST=0.0.0.0
SGLANG_SERVER_PORT=9001
SGLANG_MODEL_PATH=/root/public/DeepSeek-V3___2-Exp
SGLANG_MODEL_NAME=deepseek-v32-exp
SGLANG_TENSOR_PARALLEL_SIZE=8
SGLANG_API_KEY=sk-12345

First-time weight pull takes ~30 minutes depending on NAS bandwidth.

3. Start Chat WebUI (optional)

chmod +x /start-chat-webui.sh
sh -c /start-chat-webui.sh

WebUI listens on port 9002.

Use the model

Option 1 — Browser

In the CCI detail page "Open ports", expose 9002 and open the assigned public domain to use Chat WebUI.

Option 2 — API

OpenAI-compatible. Point base_url at port 9001:

from openai import OpenAI

client = OpenAI(
    base_url="https://<your-cci-domain>:9001/v1",
    api_key="sk-12345",
)

resp = client.chat.completions.create(
    model="deepseek-v32-exp",
    messages=[{"role": "user", "content": "Which is bigger, 9.8 or 9.11?"}],
)
print(resp.choices[0].message.content)
curl https://<your-cci-domain>:9001/v1/chat/completions \
  -H "Authorization: Bearer sk-12345" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v32-exp",
    "messages": [{"role": "user", "content": "Which is bigger, 9.8 or 9.11?"}]
  }'

Ports

PortServiceUse
9001sglang LLMOpenAI-compatible API
9002Chat WebUIBrowser chat
22SSHDebugging

Last updated on

Was this page helpful?

On this page