Documentation Cookbook Alaya Code Solutions Billing Tech Blog

Solutions Billing Tech Blog

Docs Center

Documentation Cookbook Alaya Code Solutions Billing Tech Blog Console ↗InfiniBand Fabric for GPU Clusters — bandwidth, rail-aligned topology, and triage NCCL-tests in practice — the mandatory pre-launch health check Faster HuggingFace model downloads — practical playbook GPU Monitoring & XID triage — six tables for on-call Replacing pip / poetry with uv — Python packaging + index setup for AI projects Kubernetes for AI Engineers — concepts and a working minimum Docker Essentials & Mirror Setup (2026)Pushing vLLM to 4500 tokens/s on H800A Network topology for kilo-GPU training — from Fat-Tree to Dragonfly+Lifting multi-tenant isolation to Confidential Compute grade A practitioner's guide to AI compute selection A CXO guide to reducing AI compute cost

Tech Blog

Field notes from the engineering team — inference tuning, kilo-GPU training, network topology, compliance, compute strategy, and cost case studies. Updated monthly.

May 20267 posts

OpsMay 714 min read
InfiniBand Fabric for GPU Clusters — bandwidth, rail-aligned topology, and triage
Why LLM clusters insist on InfiniBand, how rail-aligned saves 30% training time, and the ibstat / ibdev2netdev / ib_write_bw routine that fixes most field issues.
Alaya Network Engineering
OpsMay 612 min read
NCCL-tests in practice — the mandatory pre-launch health check
A new cluster comes up. Don't run training yet. One sweep of NCCL-tests tells you whether it can host LLMs at all — here is how to read the busbw.
Alaya Compute Engineering
ToolingMay 59 min read
Faster HuggingFace model downloads — practical playbook
A 70B checkpoint is 140 GB. Direct from huggingface.co takes 4+ hours; over hf-mirror at line-rate 1 GbE it's 22 minutes. A working set of options for 2026.
Alaya Compute Engineering
OpsMay 413 min read
GPU Monitoring & XID triage — six tables for on-call
What to read in nvidia-smi vs DCGM, what XID 79 / 31 / 119 actually mean, and when to RMA. A pocket reference for SREs and escalation engineers.
Alaya Reliability
ToolingMay 38 min read
Replacing pip / poetry with uv — Python packaging + index setup for AI projects
pip takes 80s to install a vLLM stack, uv does it in 8s — and the lockfile is clean and reproducible. Here is the setup we ship to GPU customers.
Alaya Platform Engineering
ToolingMay 211 min read
Kubernetes for AI Engineers — concepts and a working minimum
Pod, Deployment, Job, PVC, StatefulSet — only the parts you actually use in GPU workflows. Skip the rest until you hit it.
Alaya Platform Engineering
ToolingMay 19 min read
Docker Essentials & Mirror Setup (2026)
Image pulls time out, Docker Hub rate-limits, public accelerators keep going dark — here is the config we are actually shipping to customers in 2026.
Alaya Platform Engineering

April 20261 posts

InferenceApril 2212 min read
Pushing vLLM to 4500 tokens/s on H800A
A single 8×H800A node serving Qwen3-72B-Instruct (quantized). End-to-end notes on paged attention, continuous batching, and KV-cache hit-rate tuning — a full-stack throughput hunt.
Alaya Compute Engineering

March 20261 posts

TrainingMarch 1518 min read
Network topology for kilo-GPU training — from Fat-Tree to Dragonfly+
Why does AllReduce tail latency scale non-linearly with cluster size? With NCCL topology-aware rewrites, we lifted a 1024-GPU cluster from 71% to 89% MFU.
Alaya Network & Systems

February 20261 posts

ComplianceFebruary 89 min read
Lifting multi-tenant isolation to Confidential Compute grade
MIG + Confidential VM, GID-pinned RDMA, NVMe cryptographic erase — the isolation stack we shipped so finance and government audits pass clean.
Alaya Security Engineering

January 20261 posts

StrategyJanuary 1314 min read
A practitioner's guide to AI compute selection
From workload tiering and the accelerator quadrant to a five-axis vendor scorecard — upgrading from "TFLOPS per dollar" to "TCO per workload". A condensed 50-page guide.
Alaya Intelligence Research

December 20251 posts

CostDecember 3016 min read
A CXO guide to reducing AI compute cost
Why does cloud migration sometimes push bills up 40%? Why does long-run GPU utilization sit below 30%? A CXO TCC framework, seven KPIs and five industry case studies (top single-point savings 60%).
Alaya Intelligence Research

Free Consultation Hotline

400-805-7188

Business: css@zetyun.com

Media: contact@zetyun.com

Follow Us

Scan to follow

DataCanvas

AI Platform

OpenClaw Alaya Lab HyperTrain Model Hub Datasets

Compute Engine

CCI VKS DKS DSC

Pricing

Docs

Quick Start User Guide Support

Partners

Partner Program Become a Partner

About

About Us Awards CertificationsContact

Copyright © 2024-2026 DataCanvas

京公网安备 11010802044785 号京ICP备13015186号-8

Terms of Service丨Privacy Policy