Self-host DeepSeek-V3.2
Deploy the full-parameter DeepSeek-V3.2-Exp on a Cloud Container Instance with sglang + Chat WebUI
Availability
CCI currently supports DeepSeek-V3.2-Exp self-hosting only in the Beijing-3 region, and requires an 8× H100 / H200 reservation.
Prerequisites
- Verified account with approved GPU quota
- Beijing-3 region selected
- CCI spec: 8× H100 SXM or equivalent
- NAS storage mounted (for weight sharing)
Deploy
1. Create the CCI instance
Console → Products → Compute → Cloud Container Instance → Create
Pick:
| Item | Value |
|---|---|
| Image | alayanew/deepseek-v32-exp:sglang-latest |
| GPU | 8× H100 SXM |
| Network | Default VPC + 10 Mbps public bandwidth |
| Storage | NAS mounted at /root/public |
2. Start inference
Inside the container:
chmod +x /start-llm-inference.sh
sh -c /start-llm-inference.shThe container starts sglang with these env vars:
SGLANG_SERVER_HOST=0.0.0.0
SGLANG_SERVER_PORT=9001
SGLANG_MODEL_PATH=/root/public/DeepSeek-V3___2-Exp
SGLANG_MODEL_NAME=deepseek-v32-exp
SGLANG_TENSOR_PARALLEL_SIZE=8
SGLANG_API_KEY=sk-12345First-time weight pull takes ~30 minutes depending on NAS bandwidth.
3. Start Chat WebUI (optional)
chmod +x /start-chat-webui.sh
sh -c /start-chat-webui.shWebUI listens on port 9002.
Use the model
Option 1 — Browser
In the CCI detail page "Open ports", expose 9002 and open the assigned public domain to use Chat WebUI.
Option 2 — API
OpenAI-compatible. Point base_url at port 9001:
from openai import OpenAI
client = OpenAI(
base_url="https://<your-cci-domain>:9001/v1",
api_key="sk-12345",
)
resp = client.chat.completions.create(
model="deepseek-v32-exp",
messages=[{"role": "user", "content": "Which is bigger, 9.8 or 9.11?"}],
)
print(resp.choices[0].message.content)curl https://<your-cci-domain>:9001/v1/chat/completions \
-H "Authorization: Bearer sk-12345" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v32-exp",
"messages": [{"role": "user", "content": "Which is bigger, 9.8 or 9.11?"}]
}'Ports
| Port | Service | Use |
|---|---|---|
| 9001 | sglang LLM | OpenAI-compatible API |
| 9002 | Chat WebUI | Browser chat |
| 22 | SSH | Debugging |
Last updated on
Was this page helpful?
