VKS Product Overview

Updated At:2025-12-09 16:58:25

Virtual Kubernetes Service (VKS ) delivers integrated computing resources and a comprehensive suite of tools designed to help users efficiently conduct high-performance computing (HPC) tasks. With the inherently elastic scaling capabilities of VKS, users can dynamically adjust computing resource configurations in real time based on parallel computing requirements, data intensity, task progress, and resource utilization, ensuring optimal performance, cost-efficiency, and alignment with task demands.

GPU Computing Core Resources

High Performance Computing: Powered by the latest GPU technologies, VKS delivers exceptional floating-point computational power, making it an ideal solution for workloads that require massive parallelism, such as deep learning and scientific simulations.
High Performance Storage: Purpose-built for large model scenarios, the Alaya NeW storage infrastructure pairs a robust multi-center data platform, NeW Dingo, with model-centric innovations such as multi-site storage, corpus-specific compression algorithms, materialized filesystem views, seamless integration with vector databases, and native security policies. These advancements enable up to 70%-90% savings in storage space, 50% network IO reduction, and up to a 10X increase in corpus processing speed.
High Performance Networking: Leveraging highly available GPU clusters and specialized accelerators, alongside advanced distributed computing frameworks, the networking solution within VKS empowers efficient, large-scale data processing and complex model training scenarios.

AI Data Center Operating System

With state-of-the-art heterogeneous management capabilities, VKS delivers full support for GPU resources across vendors. Cutting-edge network communication algorithms, optimized for IB and RoCE architectures, high-performance storage engineered for large models, and a serverless elastic HPC architecture all combine to deliver transparent resource scheduling and management. This empowers users to focus solely on core AI training and inference tasks, eliminating the operational overhead of hardware management.
The platform enables cross-center intelligent compute cluster scheduling, integrating both full-featured and lightweight kernels to centrally manage diverse compute centers, clusters, and GPU cloud services. For large-scale AI workloads, Alaya NeW Cloud provides dedicated scheduling algorithms and strategies for AI acceleration. Automated O&M features—including failure-aware, topology-aware scheduling, GANG scheduling, and dynamic fair scheduling—drive further optimization for enhanced compute availability.

Large Language Model Support

VKS is compatible with leading third-party foundation models, including DeepSeek, QWen, LLaMA, and ChatGLM.

Serverless

Development-Time Serverless: Harnessing the synergy of compute packages and a robust serverless architecture, the purpose-built Aladdin (Alaya AI Addin) provides seamless integration between local development environments and the VKS. Developers can effortlessly leverage cloud-scale compute without infrastructure concerns, achieving a smooth and productive workflow for both complex model training and large-scale data processing.
Training-Time Serverless: Open API, built on top of VKS's streamlined management and lightning-fast cold-start capability, is tailored for AI training and fine-tuning scenarios. Open API simplifies and accelerates the end-to-end journey from model development to optimization. Developers can rapidly deploy and scale compute resources for efficient training, benefiting from seamless integration and automated management—supporting both small-scale experiments and large-scale production deployments.
Inference-Time Serverless: Designed to manage inference tasks for machine learning and deep learning models, VKS automatically orchestrates and scales execution environments in response to workload demands, enabling developers to concentrate on model logic and optimization.

With these features, Alaya NeW VKS offers a powerful and flexible platform tailored to a variety of users. By leveraging this platform, users can make full use of elastic resource scaling, optimize their workflows, and transform their ideas into real-world applications more efficiently than ever—empowering both individual developers and enterprises to innovate at speed and scale.

GPU Computing Core Resources​

AI Data Center Operating System​

Large Language Model Support​

Serverless​

GPU Computing Core Resources

AI Data Center Operating System

Large Language Model Support

Serverless