九章智算云

Jupyter 远程开发环境部署

前提条件

  • 本次部署会用到 Kubernetes,请确保本地有可用的 Kubernestes 客户端工具 kubectl,安装请参考安装 kubectl

  • 用户已开通弹性容器集群,且集群可正常使用。如尚未开通,可参考开通弹性容器集群完成开通。

  • 用户已安装Aladdin插件,安装步骤可参看安装Aladdin章节所示。

准备工作

下载源码文件

本示例用户需下载所需要的源码文件,本示例包含以下文件,文件的作用及说明如下所示。

文件名功能
Dockerfile该文件用来构建docker镜像。
config_harbor_secret.json配置与Harbor容器镜像仓库相关的敏感信息。
jupyter_harbor_secret.yaml定义secret资源:在部署deployment资源时,用来拉取自定义镜像。
jupyter_deploy.yaml定义Deployment资源,该资源定义如何启停Pod。
jupyter_svc.yaml定义Service资源,该资源处理网络和发布服务 。
jupyter_serviceexport.yaml定义ServiceExporter资源,该资源将服务发布到公网。

源码文件解析

  • Dockerfile:将基于 PyTorch 的基础镜像来创建自定义镜像,文件详细信息如下所示。
Dockerfile 文件详情
# 使用官方的 PyTorch 镜像作为基础镜像
FROM pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel

# 安装额外的 Python 包
RUN pip install --no-cache-dir jupyterlab pandas matplotlib

# 设置工作目录
WORKDIR /workspace

# 设置 JUPYTER_DATA_DIR 环境变量
ENV JUPYTER_DATA_DIR=/workspace/.jupyter

# 暴露 Jupyter 默认端口
EXPOSE 8888

# 启动 JupyterLab
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--allow-root", "--no-browser"]
  • config_harbor_secret.json:在本示例中,该文件配置了 harbor 镜像仓库的基本信息,文件详细信息如下所示。用户需将以下信息替换为实际信息。
config_harbor_secret.json 文件详情
{
  "auths": {
    "your_harbor_server": {
      "username": "your_username",
      "password": "your_password",
      "email": "your_email"
    }
  }
}
变量名说明来源示例
your_harbor_serverHarbor 服务器地址资源中心/存储管理/镜像仓库界面https://registry.hd-01.alayanew.com:8443
usernameHarbor 登录用户名开通短信user
passwordHarbor 登录密码开通短信password
email用户邮箱地址-user@example.com
  • jupyter_harbor_secret.yaml:该文件配置了密码信息,用于存储和管理敏感信息(如密码、API 密钥、证书等)。文件详细信息如下所示。用户需将以下信息替换为实际信息。
jupyter_harbor_secret.yaml 文件详情
apiVersion: v1
kind: Secret
metadata:
  name: harbor-secret
  namespace: jupyter
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: ewogICJhdXRocyI6IHsKICAgICJyZWdpc3RyeS5oZC0wMS5hbGF5YW5ldy5jb206ODQ0MyI6IHsKICAgICAgInVzZXJuYW1lIjogInZjLWh1YW5neHMiLAogICAgICAicGFzc3dvcmQiOiAiQWJjMTIzNDU2IiwKICAgICAgImVtYWlsIjogImh1YW5neHNAemV0eXVuLmNvbSIKICAgIH0KICB9Cn0K
变量名说明来源示例
.dockerconfigjson使用 base64 对 config_harbor_secret.json 进行编码手动编码0ssdxkcjuielsdjf...
  • jupyter_deploy.yaml:指定部署信息。文件详细信息如下所示,用户需将以下信息替换为实际信息。
jupyter_deploy.yaml 文件详情
apiVersion: apps/v1
kind: Deployment
metadata:
  name: jupyter-deploy
  namespace: jupyter
  labels:
    app: jy
spec:
  replicas: 1  
  selector:
    matchLabels:
      app: jy
  template:
    metadata:
      labels:
        app: jy
    spec:
      restartPolicy: Always
      securityContext:
      containers:
        - name: sd-cuda-container
          image: 镜像仓库地址/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel-ssh-1.0 # 替换自己的镜像
          resources:
            requests:
              memory: "4Gi"
              cpu: "500m"
              nvidia.com/gpu-h800: 1 # 替换自己集群的GPU资源标识
            limits:
              memory: "8Gi"
              cpu: "1000m"
              nvidia.com/gpu-h800: 1 # 替换自己集群的GPU资源标识
          ports:
            - containerPort: 8888
              name: http-port
              protocol: TCP
          volumeMounts:
            - name: workspace
              mountPath: "/workspace"
              subPath: "jupyter/workspace"
      imagePullSecrets:
        - name: harbor-secret
      volumes:
        - name: workspace
          persistentVolumeClaim:
            claimName: pvc-capacity-userdata
变量名说明来源示例
image镜像名称自定义镜像registry.hd-01.alayanew.com:8443/alayanew-dab57f9b-35f5-4dc1-afff-5cfd02esdsfe/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel-ssh-1.0
resources.requests.[GPU]GPU 资源信息弹性容器集群/集群详情/算力配置nvidia.com/gpu-h800
volumes.persistentVolumeClaim.claimNamePVC 名称默认创建的 PVC,参考声明存储pvc-capacity-userdata
  • jupyter_svc.yaml:指定服务信息。本示例定义了一个 ClusterIP 类型的服务,用于处理 Kubernetes 中的网络流量,开放 TCP 端口 8888,旨在将部署发布到公共互联网。
jupyter_svc.yaml 文件详情
apiVersion: v1
kind: Service
metadata:
  name: jupyter-svc
  namespace: jupyter
spec:
  selector:
    # 这里需要指定选择器,以便 Service 能够找到正确的 Pod
    # 确保 Pod 的 metadata.labels 与这里的 selector 匹配
    app: jy
  ports:
    - protocol: TCP
      port: 8888 # Service 暴露在节点上的端口
      targetPort: 8888 # Pod 容器内的端口
  type: ClusterIP # Service 类型,NodePort 允许从集群外部访问
  • jupyter_serviceexport.yaml:用于定义和服务导出(Service Export)相关的资源。可用于向公网发布服务。
jupyter_serviceexport.yaml 文件详情
# vcluster 对外发布服务
apiVersion: osm.datacanvas.com/v1alpha1
kind: ServiceExporter
metadata:
  name: jupyter-svc # immutable
  namespace: jupyter
spec:
  serviceName: jupyter-svc # required
  servicePort: 8888

操作步骤

配置镜像

  1. 执行如下命令,从远程镜像仓库(默认是 Docker Hub)拉取指定的镜像到本地,示例如下图高亮①所示。

    docker pull <镜像名>:<>
  2. 执行如下所示的命令,根据Dockerfile构建一个新的镜像。并为其分配一个唯一的名称和标签,示例如下图高亮②所示。

    docker build -t <名称:标> -f <指定Dockfile的路> .

    docker pull / build / login

  3. 执行如下所示的命令,登录到指定的私有镜像仓库Harbor。示例如上图高亮③所示。

    echo <> | docker login <仓库地> -u <用户> --password-stdin
  4. 执行如下所示的命令,为本地已有的镜像创建一个新的标签,示例如下图高亮①所示。

    docker tag <源镜>:<源标> <目标镜>:<目标标>
  5. 执行如下所示的命令,将标记后的镜像推送到目标仓库,示例如下图高亮②所示。

    docker push <目标镜>:<>

    docker tag / push

部署资源

  1. 执行如下所示的命令,声明弹性容器集群配置,示例如下图高亮①所示。

    export KUBECONFIG="</path/to/kubeconfig>"
  2. 执行如下所示的命令,创建一个Namespace,执行完成后,系统成功创建了一个新的Namespace,示例如下图高亮②所示。

    kubectl create namespace <namespace-name>
  3. 执行如下所示的命令,将jupyter_harbor_secret.yaml文件中的定义的资源配置应用到Kubernetes集群,示例如下图高亮③所示。

    kubectl apply -f jupyter_harbor_secret.yaml
  4. 执行如下所示的命令,将jupyter_harbor_secret.yaml文件中定义的资源配置应用到Kubernetes集群,示例如下图高亮④所示。

    kubectl apply -f jupyter_deploy.yaml
  5. 执行如下所示的命令,将jupyter_svc.yaml文件中定义的资源配置应用到Kubernetes集群,示例如下图高亮⑤所示。

    kubectl apply -f jupyter_svc.yaml
  6. 执行如下所示的命令,将jupyter_serviceexport.yaml文件中定义的资源配置应用到Kubernetes集群,示例如下图高亮⑥所示。

    kubectl apply -f jupyter_serviceexport.yaml

    部署资源

查看资源

  1. 执行如下所示的命令,查看Kubernetes集群中对应命名空间下的所有Pod,示例如下图高亮①所示。

    kubectl get pods -n <namespace>
  2. 执行如下所示的命令,查看Kubernetes集群中对应命名空间下的所有Deployment,示例如下图高亮②所示。

    kubectl get deploy -n <namespace>
  3. 执行如下所示的命令,查看Kubernetes集群中对应命名空间下的所有Service,示例如下图高亮②所示。

    查看 Pod / Service

    kubectl get svc -n <namespace>
  4. 执行如下所示的命令,查看Kubernetes集群中对应命名空间下的对应Pod的详细描述,输出示例文件可参见“Pod详细描述”。

    kubectl describe pod jupyter-deploy-576b8fb97d-hnmhq -n jupyter
    Pod 详细描述
    Name:             jupyter-deploy-64b8b56664-f8x2g
    Namespace:        jupyter
    Priority:         0
    Service Account:  default
    Node:             k8s-mas-gpu-8-128/100.64.8.128
    Start Time:       Wed, 19 Mar 2025 17:50:25 +0800
    Labels:           app=jy
                    pod-template-hash=64b8b56664
    Annotations:      <none>
    Status:           Running
    IP:               172.19.32.233
    IPs:
    IP:           172.19.32.233
    Controlled By:  ReplicaSet/jupyter-deploy-64b8b56664
    Containers:
    sd-cuda-container:
        Container ID:   containerd://2ec5cd2f1b5d3de081da49d72f1a565667850d026c4a6f41e4b206cc2567675a
        Image:          registry.hd-01.alayanew.com:8443/vc-app_market/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-jupyter-devel
        Image ID:       registry.hd-01.alayanew.com:8443/vc-app_market/pytorch/pytorch@sha256:dd33a84d2d6f60c3343697c044135e75bb59e2672       55f1e99dab30d7ed07a389a
        Port:           8888/TCP
        Host Port:      0/TCP
        State:          Running
        Started:      Wed, 19 Mar 2025 17:50:26 +0800
        Ready:          True
        Restart Count:  0
        Limits:
        cpu:                            1
        memory:                         8Gi
        nvidia.com/gpu-h100-80gb-hbm3:  1
        Requests:
        cpu:                            500m
        memory:                         4Gi
        nvidia.com/gpu-h100-80gb-hbm3:  1
        Environment:                      <none>
        Mounts:
        /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-56ls8 (ro)
        /workspace from workspace (rw,path="jupyter/workspace")
    Conditions:
    Type              Status
    Initialized       True
    Ready             True
    ContainersReady   True
    PodScheduled      True
    Volumes:
    workspace:
        Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
        ClaimName:  pvc-capacity-userdata
        ReadOnly:   false
    kube-api-access-56ls8:
        Type:                    Projected (a volume that contains injected data from multiple sources)
        TokenExpirationSeconds:  3607
        ConfigMapName:           kube-root-ca.crt
        ConfigMapOptional:       <nil>
        DownwardAPI:             true
    QoS Class:                   Burstable
    Node-Selectors:              <none>
    Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:                      <none>
  5. 执行如下所示的命令,查看Pod的日志以获取登录令牌。令牌将作为服务实例URL的一部分显示。输出示例文件可参见“Pod日志信息”。从获取的日志中可获取登录令牌,本实例令牌为ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831

    kubectl logs  jupyter-deploy-576b8fb97d-hnmhq -n jupyter
    Pod 日志信息
    ==========
    == CUDA ==
    ==========
    
    CUDA Version 12.4.1
    
    Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
    
    This container image and its contents are governed by the NVIDIA Deep Learning Container License.
    By pulling and using the container, you accept the terms and conditions of this license:
    https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
    
    A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
    
    [I 2025-03-19 09:50:27.049 ServerApp] jupyter_lsp | extension was successfully linked.
    [I 2025-03-19 09:50:27.051 ServerApp] jupyter_server_terminals | extension was successfully linked.
    [I 2025-03-19 09:50:27.054 ServerApp] jupyterlab | extension was successfully linked.
    [I 2025-03-19 09:50:27.223 ServerApp] notebook_shim | extension was successfully linked.
    [I 2025-03-19 09:50:27.233 ServerApp] notebook_shim | extension was successfully loaded.
    [I 2025-03-19 09:50:27.234 ServerApp] jupyter_lsp | extension was successfully loaded.
    [I 2025-03-19 09:50:27.235 ServerApp] jupyter_server_terminals | extension was successfully loaded.
    [I 2025-03-19 09:50:27.235 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.11/site-packages/jupyterlab
    [I 2025-03-19 09:50:27.235 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab
    [I 2025-03-19 09:50:27.236 LabApp] Extension Manager is 'pypi'.
    [I 2025-03-19 09:50:27.264 ServerApp] jupyterlab | extension was successfully loaded.
    [I 2025-03-19 09:50:27.264 ServerApp] Serving notebooks from local directory: /workspace
    [I 2025-03-19 09:50:27.264 ServerApp] Jupyter Server 2.14.2 is running at:
    [I 2025-03-19 09:50:27.264 ServerApp] http://jupyter-deploy-64b8b56664-f8x2g:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831
    [I 2025-03-19 09:50:27.264 ServerApp]     http://127.0.0.1:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831
    [I 2025-03-19 09:50:27.264 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
    [C 2025-03-19 09:50:27.270 ServerApp]
    
        To access the server, open this file in a browser:
            file:///workspace/.jupyter/runtime/jpserver-1-open.html
        Or copy and paste one of these URLs:
            http://jupyter-deploy-64b8b56664-f8x2g:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831
            http://127.0.0.1:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831
    [I 2025-03-19 09:50:27.279 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
  6. 执行如下所示的命令,获取Kubernetes集群中指定命名空间下的所有ServiceExport资源。示例如下图高亮①所示。

    kubectl get serviceexporter -n jupyter
  7. 执行如下所示的命令,用于获取Kubernetes 集群中指定命名空间下的ServiceExport资源的详细信息。示例如下图高亮②所示。示例中URL地址为https://jupyter-svc-x-jupyter-x-vcrbcqty8ibg.sproxy.hd-01.alayanew.com

    查看 ServiceExport

    kubectl describe serviceexporter jupyter-se-svc  -n jupyter

    通过ServiceExporter发布的服务,默认的端口为22443,网页访问时需要指定端口号。例如URL地址为https://jupyter-svc-x-jupyter-x-vcrbcqty8ibg.sproxy.hd-01.alayanew.com,则实际的登录URL地址为https://jupyter-svc-x-jupyter-x-vcrbcqty8ibg.sproxy.hd-01.alayanew.com:22443

访问服务

通过已获取的登录URL地址以及令牌访问服务,示例页面如下图所示。

Jupyter 登录页

总结

本文通过自定义镜像,在Alaya NeW弹性容器集群上部署了一个集成Jupyter Notebook和PyTorch的实例,为用户展示了如何便捷地将自定义服务部署到弹性容器集群中。用户可参考本文的部署方法,快速实现自身服务的容器化与云端部署,提升开发效率与资源利用率。

最后更新于

这篇文档对你有帮助吗?

目录