RDMA
Remote Direct Memory Access (RDMA) is a high-performance networking technology that allows one computer to directly access the memory of another without involving the remote CPU, interrupts, or the operating system kernel. This capability significantly reduces network latency and CPU load, making it well-suited for workloads that require low latency and high throughput.
Key Features
- Low Latency: Bypasses the operating system kernel, minimizing delays along the data transfer path.
- High Bandwidth: Fully utilizes the maximum bandwidth of the underlying network hardware.
- Low CPU Overhead: Since data transfer does not rely on the CPU, compute resources are freed for other tasks.
- Zero-Copy: Data can be transferred directly between application buffers without intermediate copies.
Implementation Options
There are three mainstream RDMA technologies:
- InfiniBand (IB):
- A networking architecture designed specifically for high-performance computing.
- Delivers extremely low latency and high bandwidth.
- Requires dedicated hardware (switches, NICs, etc.).
- RoCE (RDMA over Converged Ethernet):
- Enables RDMA over standard Ethernet.
- Operates on existing Ethernet infrastructure requires switches that support Data Center Bridging (DCB).
- RoCE includes two variants: RoCEv1, which is limited to Layer 2, and RoCEv2, which extends support to Layer 3 routing.
- iWARP (Internet Wide Area RDMA Protocol):
- Enables RDMA over TCP/IP.
- Operates on standard IP networks, though performance may not match InfiniBand or RoCE.
Using RDMA in VKS
Virtual Kubernetes Serices (VKS) supports cross-node RDMA. To enable RDMA in your workload, simply add RDMA device labels to the resource specification in your container’s YAML manifest. Supported configurations:
rdma/rdma_shared_device_a: 1
rdma/rdma_shared_device_b: 1
Example:
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-kuberay
spec:
rayVersion: '2.40.0' # should match the Ray version in the image of the containers
# Ray head pod template
headGroupSpec:
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-head
image: registry.hd-01.alayanew.com:8443/vc-app_market/ray-ml-vllm:0.7.1
resources:
requests:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Request 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA device
rdma/rdma_shared_device_b: 1 # RDMA device
limits:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Limit 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA device
rdma/rdma_shared_device_b: 1 # RDMA device
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: {{ .Values.raycluster.workerGroupSpecs.replicas }}
# logical group name, for this called small-group, also can be functional
groupName: workergroup
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: registry.hd-01.alayanew.com:8443/vc-app_market/ray-ml-vllm:0.7.1
resources:
requests:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Request 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA device
rdma/rdma_shared_device_b: 1 # RDMA device
limits:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Limit 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA device
rdma/rdma_shared_device_b: 1 # RDMA device