配置docker支持gpu
现在很多ai相关的程序会跑在docker当中,默认docker是不支持gpu的,所以需要使其支持gpu
安装驱动
打开nvidia官网的驱动下载界面根据显卡和操作系统的类型下载对应的驱动
安装驱动,可能会报错缺少一些依赖,把缺少的安装即可
centos7可能会遇到类似找不到kernel-source类似的报错需要手动添加
--kernel-source-path /usr/src/kernels/3.10.0-1160.119.1.el7.x86_64/
来配置内核源码路径
# 安装需要gcc支持
yum install gcc
yum install kernel-devel
bash NVIDIA-Linux-x86_64-440.95.01.run -a -s -Z -X
- 验证安装
nvidia-smi
# Fri Jun 14 16:55:37 2024
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 440.95.01 Driver Version: 440.95.01 CUDA Version: 10.2 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# |===============================+======================+======================|
# | 0 Tesla V100-PCIE... Off | 00000000:00:03.0 Off | 0 |
# | N/A 32C P0 36W / 250W | 0MiB / 16160MiB | 0% Default |
# +-------------------------------+----------------------+----------------------+
#
# +-----------------------------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |=============================================================================|
# | No running processes found |
# +-----------------------------------------------------------------------------+
安装docker
- 使用官方一件脚本安装
curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun
- 启动
systemctl start docker
systemctl enable docker
- 验证安装
docker ps
# CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
安装nvidia-toolkit
- 配置yaml仓库
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
yum-config-manager --enable nvidia-container-toolkit-experimental
- 安装toolkit
yum install -y nvidia-container-toolkit
- 配置
nvidia-ctk runtime configure --runtime=docker
systemctl restart docker
- 测试,出现下面则表示安装成功
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
# Fri Jun 14 09:24:04 2024
# +-----------------------------------------------------------------------------+
# | NVIDIA-SMI 440.95.01 Driver Version: 440.95.01 CUDA Version: 10.2 |
# |-------------------------------+----------------------+----------------------+
# | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
# | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
# |===============================+======================+======================|
# | 0 Tesla V100-PCIE... Off | 00000000:00:03.0 Off | 0 |
# | N/A 36C P0 35W / 250W | 0MiB / 16160MiB | 0% Default |
# +-------------------------------+----------------------+----------------------+
#
# +-----------------------------------------------------------------------------+
# | Processes: GPU Memory |
# | GPU PID Type Process name Usage |
# |=============================================================================|
# | No running processes found |
# +-----------------------------------------------------------------------------+
k8s containerd配置
sudo nvidia-ctk runtime configure --runtime=containerd
sudo systemctl restart containerd
参考资料
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html