1.构建Docker镜像时安装各种依赖项和软件包时,遇到问题:
bash
E: Unable to locate package libcudnn7
E: Version '2.7.8-1+cuda11.0' for 'libnccl2' was not found
E: Version '2.7.8-1+cuda11.0' for 'libnccl-dev' was not found
配置代码片段:
bash
FROM nvidia/cuda:11.0.3-devel-ubuntu18.04
ENV PROJECT=permatrack
# ENV PYTORCH_VERSION=1.4
# ENV TORCHVISION_VERSION=0.5.0
ENV PYTORCH_VERSION=1.7
ENV TORCHVISION_VERSION=0.8.0
ENV CUDNN_VERSION=8.0.5.39+cuda11.0
ENV NCCL_VERSION=2.7.8-1+cuda11.0
ENV TRT_VERSION=7.2.3
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
RUN apt-get update && apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
build-essential \
cmake \
g++-4.8 \
git \
curl \
docker.io \
vim \
wget \
ca-certificates \
libcudnn7=${CUDNN_VERSION} \
libnccl2=${NCCL_VERSION} \
libnccl-dev=${NCCL_VERSION} \
libjpeg-dev \
libpng-dev \
python${PYTHON_VERSION} \
python${PYTHON_VERSION}-dev \
python3-tk \
librdmacm1 \
libibverbs1 \
libgtk2.0-dev \
unzip \
bzip2 \
htop \
gnuplot \
ffmpeg
2.解决办法:不构建cudnn和nccl,即将下面3行注释:
bash
libcudnn7=${CUDNN_VERSION} \
libnccl2=${NCCL_VERSION} \
libnccl-dev=${NCCL_VERSION} \
怀疑:后面nvidia的cuda镜像包含了cudnn和nccl,而在下面的代码中使用docker需要安装nccl和cudnn,应该是镜像比较早的,不过下面的镜像也找不到了,cuda镜像也得改成最新的了GitHub - TRI-ML/permatrack: Implementation for Learning to Track with Object Permanence
看到一个比较好的安装nvidia cuda镜像和使用容器的博客:Ubuntu上从CUDA开始构建深度学习镜像 - 八十八键的宇宙 (yuxinzhao.net)