Jupyter 远程开发环境部署
前提条件
-
本次部署会用到 Kubernetes,请确保本地有可用的 Kubernestes 客户端工具 kubectl,安装请参考安装 kubectl。
-
用户已开通弹性容器集群,且集群可正常使用。如尚未开通,可参考开通弹性容器集群完成开通。
-
用户已安装Aladdin插件,安装步骤可参看安装Aladdin章节所示。
准备工作
下载源码文件
本示例用户需下载所需要的源码文件,本示例包含以下文件,文件的作用及说明如下所示。
| 文件名 | 功能 |
|---|---|
| Dockerfile | 该文件用来构建docker镜像。 |
| config_harbor_secret.json | 配置与Harbor容器镜像仓库相关的敏感信息。 |
| jupyter_harbor_secret.yaml | 定义secret资源:在部署deployment资源时,用来拉取自定义镜像。 |
| jupyter_deploy.yaml | 定义Deployment资源,该资源定义如何启停Pod。 |
| jupyter_svc.yaml | 定义Service资源,该资源处理网络和发布服务 。 |
| jupyter_serviceexport.yaml | 定义ServiceExporter资源,该资源将服务发布到公网。 |
源码文件解析
- Dockerfile:将基于 PyTorch 的基础镜像来创建自定义镜像,文件详细信息如下所示。
Dockerfile 文件详情
# 使用官方的 PyTorch 镜像作为基础镜像
FROM pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel
# 安装额外的 Python 包
RUN pip install --no-cache-dir jupyterlab pandas matplotlib
# 设置工作目录
WORKDIR /workspace
# 设置 JUPYTER_DATA_DIR 环境变量
ENV JUPYTER_DATA_DIR=/workspace/.jupyter
# 暴露 Jupyter 默认端口
EXPOSE 8888
# 启动 JupyterLab
CMD ["jupyter", "lab", "--ip=0.0.0.0", "--port=8888", "--allow-root", "--no-browser"]- config_harbor_secret.json:在本示例中,该文件配置了 harbor 镜像仓库的基本信息,文件详细信息如下所示。用户需将以下信息替换为实际信息。
config_harbor_secret.json 文件详情
{
"auths": {
"your_harbor_server": {
"username": "your_username",
"password": "your_password",
"email": "your_email"
}
}
}| 变量名 | 说明 | 来源 | 示例 |
|---|---|---|---|
your_harbor_server | Harbor 服务器地址 | 资源中心/存储管理/镜像仓库界面 | https://registry.hd-01.alayanew.com:8443 |
username | Harbor 登录用户名 | 开通短信 | user |
password | Harbor 登录密码 | 开通短信 | password |
email | 用户邮箱地址 | - | user@example.com |
- jupyter_harbor_secret.yaml:该文件配置了密码信息,用于存储和管理敏感信息(如密码、API 密钥、证书等)。文件详细信息如下所示。用户需将以下信息替换为实际信息。
jupyter_harbor_secret.yaml 文件详情
apiVersion: v1
kind: Secret
metadata:
name: harbor-secret
namespace: jupyter
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: ewogICJhdXRocyI6IHsKICAgICJyZWdpc3RyeS5oZC0wMS5hbGF5YW5ldy5jb206ODQ0MyI6IHsKICAgICAgInVzZXJuYW1lIjogInZjLWh1YW5neHMiLAogICAgICAicGFzc3dvcmQiOiAiQWJjMTIzNDU2IiwKICAgICAgImVtYWlsIjogImh1YW5neHNAemV0eXVuLmNvbSIKICAgIH0KICB9Cn0K| 变量名 | 说明 | 来源 | 示例 |
|---|---|---|---|
.dockerconfigjson | 使用 base64 对 config_harbor_secret.json 进行编码 | 手动编码 | 0ssdxkcjuielsdjf... |
- jupyter_deploy.yaml:指定部署信息。文件详细信息如下所示,用户需将以下信息替换为实际信息。
jupyter_deploy.yaml 文件详情
apiVersion: apps/v1
kind: Deployment
metadata:
name: jupyter-deploy
namespace: jupyter
labels:
app: jy
spec:
replicas: 1
selector:
matchLabels:
app: jy
template:
metadata:
labels:
app: jy
spec:
restartPolicy: Always
securityContext:
containers:
- name: sd-cuda-container
image: 镜像仓库地址/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel-ssh-1.0 # 替换自己的镜像
resources:
requests:
memory: "4Gi"
cpu: "500m"
nvidia.com/gpu-h800: 1 # 替换自己集群的GPU资源标识
limits:
memory: "8Gi"
cpu: "1000m"
nvidia.com/gpu-h800: 1 # 替换自己集群的GPU资源标识
ports:
- containerPort: 8888
name: http-port
protocol: TCP
volumeMounts:
- name: workspace
mountPath: "/workspace"
subPath: "jupyter/workspace"
imagePullSecrets:
- name: harbor-secret
volumes:
- name: workspace
persistentVolumeClaim:
claimName: pvc-capacity-userdata| 变量名 | 说明 | 来源 | 示例 |
|---|---|---|---|
image | 镜像名称 | 自定义镜像 | registry.hd-01.alayanew.com:8443/alayanew-dab57f9b-35f5-4dc1-afff-5cfd02esdsfe/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-devel-ssh-1.0 |
resources.requests.[GPU] | GPU 资源信息 | 弹性容器集群/集群详情/算力配置 | nvidia.com/gpu-h800 |
volumes.persistentVolumeClaim.claimName | PVC 名称 | 默认创建的 PVC,参考声明存储 | pvc-capacity-userdata |
- jupyter_svc.yaml:指定服务信息。本示例定义了一个 ClusterIP 类型的服务,用于处理 Kubernetes 中的网络流量,开放 TCP 端口 8888,旨在将部署发布到公共互联网。
jupyter_svc.yaml 文件详情
apiVersion: v1
kind: Service
metadata:
name: jupyter-svc
namespace: jupyter
spec:
selector:
# 这里需要指定选择器,以便 Service 能够找到正确的 Pod
# 确保 Pod 的 metadata.labels 与这里的 selector 匹配
app: jy
ports:
- protocol: TCP
port: 8888 # Service 暴露在节点上的端口
targetPort: 8888 # Pod 容器内的端口
type: ClusterIP # Service 类型,NodePort 允许从集群外部访问- jupyter_serviceexport.yaml:用于定义和服务导出(Service Export)相关的资源。可用于向公网发布服务。
jupyter_serviceexport.yaml 文件详情
# vcluster 对外发布服务
apiVersion: osm.datacanvas.com/v1alpha1
kind: ServiceExporter
metadata:
name: jupyter-svc # immutable
namespace: jupyter
spec:
serviceName: jupyter-svc # required
servicePort: 8888操作步骤
配置镜像
-
执行如下命令,从远程镜像仓库(默认是 Docker Hub)拉取指定的镜像到本地,示例如下图高亮①所示。
docker pull <镜像名称>:<标签> -
执行如下所示的命令,根据Dockerfile构建一个新的镜像。并为其分配一个唯一的名称和标签,示例如下图高亮②所示。
docker build -t <名称:标签> -f <指定Dockfile的路径> .
-
执行如下所示的命令,登录到指定的私有镜像仓库Harbor。示例如上图高亮③所示。
echo <密码> | docker login <仓库地址> -u <用户名> --password-stdin -
执行如下所示的命令,为本地已有的镜像创建一个新的标签,示例如下图高亮①所示。
docker tag <源镜像>:<源标签> <目标镜像>:<目标标签> -
执行如下所示的命令,将标记后的镜像推送到目标仓库,示例如下图高亮②所示。
docker push <目标镜像>:<标签>
部署资源
-
执行如下所示的命令,声明弹性容器集群配置,示例如下图高亮①所示。
export KUBECONFIG="</path/to/kubeconfig>" -
执行如下所示的命令,创建一个Namespace,执行完成后,系统成功创建了一个新的Namespace,示例如下图高亮②所示。
kubectl create namespace <namespace-name> -
执行如下所示的命令,将
jupyter_harbor_secret.yaml文件中的定义的资源配置应用到Kubernetes集群,示例如下图高亮③所示。kubectl apply -f jupyter_harbor_secret.yaml -
执行如下所示的命令,将
jupyter_harbor_secret.yaml文件中定义的资源配置应用到Kubernetes集群,示例如下图高亮④所示。kubectl apply -f jupyter_deploy.yaml -
执行如下所示的命令,将
jupyter_svc.yaml文件中定义的资源配置应用到Kubernetes集群,示例如下图高亮⑤所示。kubectl apply -f jupyter_svc.yaml -
执行如下所示的命令,将
jupyter_serviceexport.yaml文件中定义的资源配置应用到Kubernetes集群,示例如下图高亮⑥所示。kubectl apply -f jupyter_serviceexport.yaml
查看资源
-
执行如下所示的命令,查看Kubernetes集群中对应命名空间下的所有
Pod,示例如下图高亮①所示。kubectl get pods -n <namespace> -
执行如下所示的命令,查看Kubernetes集群中对应命名空间下的所有
Deployment,示例如下图高亮②所示。kubectl get deploy -n <namespace> -
执行如下所示的命令,查看Kubernetes集群中对应命名空间下的所有
Service,示例如下图高亮②所示。
kubectl get svc -n <namespace> -
执行如下所示的命令,查看Kubernetes集群中对应命名空间下的对应Pod的详细描述,输出示例文件可参见“Pod详细描述”。
kubectl describe pod jupyter-deploy-576b8fb97d-hnmhq -n jupyterPod 详细描述
Name: jupyter-deploy-64b8b56664-f8x2g Namespace: jupyter Priority: 0 Service Account: default Node: k8s-mas-gpu-8-128/100.64.8.128 Start Time: Wed, 19 Mar 2025 17:50:25 +0800 Labels: app=jy pod-template-hash=64b8b56664 Annotations: <none> Status: Running IP: 172.19.32.233 IPs: IP: 172.19.32.233 Controlled By: ReplicaSet/jupyter-deploy-64b8b56664 Containers: sd-cuda-container: Container ID: containerd://2ec5cd2f1b5d3de081da49d72f1a565667850d026c4a6f41e4b206cc2567675a Image: registry.hd-01.alayanew.com:8443/vc-app_market/pytorch/pytorch:2.5.1-cuda12.4-cudnn9-jupyter-devel Image ID: registry.hd-01.alayanew.com:8443/vc-app_market/pytorch/pytorch@sha256:dd33a84d2d6f60c3343697c044135e75bb59e2672 55f1e99dab30d7ed07a389a Port: 8888/TCP Host Port: 0/TCP State: Running Started: Wed, 19 Mar 2025 17:50:26 +0800 Ready: True Restart Count: 0 Limits: cpu: 1 memory: 8Gi nvidia.com/gpu-h100-80gb-hbm3: 1 Requests: cpu: 500m memory: 4Gi nvidia.com/gpu-h100-80gb-hbm3: 1 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-56ls8 (ro) /workspace from workspace (rw,path="jupyter/workspace") Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: workspace: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-capacity-userdata ReadOnly: false kube-api-access-56ls8: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: <none> -
执行如下所示的命令,查看Pod的日志以获取登录令牌。令牌将作为服务实例URL的一部分显示。输出示例文件可参见“Pod日志信息”。从获取的日志中可获取登录令牌,本实例令牌为
ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831。kubectl logs jupyter-deploy-576b8fb97d-hnmhq -n jupyterPod 日志信息
========== == CUDA == ========== CUDA Version 12.4.1 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience. [I 2025-03-19 09:50:27.049 ServerApp] jupyter_lsp | extension was successfully linked. [I 2025-03-19 09:50:27.051 ServerApp] jupyter_server_terminals | extension was successfully linked. [I 2025-03-19 09:50:27.054 ServerApp] jupyterlab | extension was successfully linked. [I 2025-03-19 09:50:27.223 ServerApp] notebook_shim | extension was successfully linked. [I 2025-03-19 09:50:27.233 ServerApp] notebook_shim | extension was successfully loaded. [I 2025-03-19 09:50:27.234 ServerApp] jupyter_lsp | extension was successfully loaded. [I 2025-03-19 09:50:27.235 ServerApp] jupyter_server_terminals | extension was successfully loaded. [I 2025-03-19 09:50:27.235 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.11/site-packages/jupyterlab [I 2025-03-19 09:50:27.235 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab [I 2025-03-19 09:50:27.236 LabApp] Extension Manager is 'pypi'. [I 2025-03-19 09:50:27.264 ServerApp] jupyterlab | extension was successfully loaded. [I 2025-03-19 09:50:27.264 ServerApp] Serving notebooks from local directory: /workspace [I 2025-03-19 09:50:27.264 ServerApp] Jupyter Server 2.14.2 is running at: [I 2025-03-19 09:50:27.264 ServerApp] http://jupyter-deploy-64b8b56664-f8x2g:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831 [I 2025-03-19 09:50:27.264 ServerApp] http://127.0.0.1:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831 [I 2025-03-19 09:50:27.264 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 2025-03-19 09:50:27.270 ServerApp] To access the server, open this file in a browser: file:///workspace/.jupyter/runtime/jpserver-1-open.html Or copy and paste one of these URLs: http://jupyter-deploy-64b8b56664-f8x2g:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831 http://127.0.0.1:8888/lab?token=ddf27395439c1197201cf7fa6d8e350e72c80d67f3a1d831 [I 2025-03-19 09:50:27.279 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, pyright, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server -
执行如下所示的命令,获取Kubernetes集群中指定命名空间下的所有
ServiceExport资源。示例如下图高亮①所示。kubectl get serviceexporter -n jupyter -
执行如下所示的命令,用于获取
Kubernetes集群中指定命名空间下的ServiceExport资源的详细信息。示例如下图高亮②所示。示例中URL地址为https://jupyter-svc-x-jupyter-x-vcrbcqty8ibg.sproxy.hd-01.alayanew.com
kubectl describe serviceexporter jupyter-se-svc -n jupyter通过ServiceExporter发布的服务,默认的端口为22443,网页访问时需要指定端口号。例如URL地址为
https://jupyter-svc-x-jupyter-x-vcrbcqty8ibg.sproxy.hd-01.alayanew.com,则实际的登录URL地址为https://jupyter-svc-x-jupyter-x-vcrbcqty8ibg.sproxy.hd-01.alayanew.com:22443
访问服务
通过已获取的登录URL地址以及令牌访问服务,示例页面如下图所示。

总结
本文通过自定义镜像,在Alaya NeW弹性容器集群上部署了一个集成Jupyter Notebook和PyTorch的实例,为用户展示了如何便捷地将自定义服务部署到弹性容器集群中。用户可参考本文的部署方法,快速实现自身服务的容器化与云端部署,提升开发效率与资源利用率。
最后更新于
