RDMA
Remote Direct Memory Access (RDMA) is a high-performance networking technology that allows one computer to directly access the memory of another without involving the remote CPU, interrupts, or the operating system kernel. This mechanism significantly reduces network latency and CPU load, making it ideal for scenarios that require low latency and high throughput.
Key Features
- Low Latency: Bypasses the operating system kernel, reducing delays along the data transfer path.
- High Bandwidth: Fully leverages the maximum bandwidth of the underlying network hardware.
- Low CPU Overhead: Since data transfer does not rely on the CPU, compute resources can be freed for other tasks.
- Zero-Copy: Data can be transferred directly between application buffers without intermediate copies.
Implementation Options
There are three mainstream RDMA technologies:
- InfiniBand (IB):
- A networking protocol designed specifically for high-performance computing.
- Provides extremely low latency and high bandwidth.
- Requires dedicated hardware (switches, NICs, etc.).
- RoCE (RDMA over Converged Ethernet):
- Enables RDMA on standard Ethernet.
- Can run on existing Ethernet infrastructure but requires switches supporting Data Center Bridging (DCB).
- RoCE includes two variants: RoCEv1, limited to Layer 2, and RoCEv2, which extends support to Layer-3 routing.
- iWARP (Internet Wide Area RDMA Protocol):
- Runs RDMA over TCP/IP.
- Works on standard IP networks, though performance may not match InfiniBand or RoCE.
Using RDMA in VKS
The VKS (Virtual Kubernetes Serices) supports RDMA across nodes. To enable RDMA within your workload, simply add RDMA device labels to the resource spec in your container’s YAML file. Supported configurations:
rdma/rdma_shared_device_a: 1
rdma/rdma_shared_device_b: 1
Example:
apiVersion: ray.io/v1
kind: RayCluster
metadata:
name: raycluster-kuberay
spec:
rayVersion: '2.40.0' # should match the Ray version in the image of the containers
# Ray head pod template
headGroupSpec:
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-head
image: registry.hd-01.alayanew.com:8443/vc-app_market/ray-ml-vllm:0.7.1
resources:
requests:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Request 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA configuration
rdma/rdma_shared_device_b: 1 # RDMA configuration
limits:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Limit 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA configuration
rdma/rdma_shared_device_b: 1 # RDMA configuration
workerGroupSpecs:
# the pod replicas in this group typed worker
- replicas: {{ .Values.raycluster.workerGroupSpecs.replicas }}
# logical group name, for this called small-group, also can be functional
groupName: workergroup
rayStartParams: {}
#pod template
template:
spec:
containers:
- name: ray-worker # must consist of lower case alphanumeric characters or '-', and must start and end with an alphanumeric character (e.g. 'my-name', or '123-abc'
image: registry.hd-01.alayanew.com:8443/vc-app_market/ray-ml-vllm:0.7.1
resources:
requests:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Request 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA configuration
rdma/rdma_shared_device_b: 1 # RDMA configuration
limits:
memory: "1600G"
cpu: "144"
nvidia.com/gpu-h800: 8 # Limit 8 GPUs
rdma/rdma_shared_device_a: 1 # RDMA configuration
rdma/rdma_shared_device_b: 1 # RDMA configuration