Nvidia docker 验证HSOpticalFlow

前情提要:

docker是dotCloud 公司为了数据库研发的,所以仅支持CPU和数据交换的部分,nvidia公司觉得这是个好东西,自己派人研发了GPU相关的部分,取名为NVIDIA Container Toolkit,我们可以理解为docker+GPU插件,以后深度学习的环境可以用这个,就不用改中间件版本了。

正文

这个服务器上已经有docker了,但是因为没有NVIDIA Container Toolkit,所以运行GPU的images报错,装上之后正常运行了

报错如下:

cpp 复制代码
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:24.04-py3
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

安装代码:

cpp 复制代码
distribution="ubuntu20.04" 这里改成你自己的版本
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

装完了验证的话,可以输入:

cpp 复制代码
dpkg -l | grep nvidia-container-toolkit

显示:

cpp 复制代码
ii  nvidia-container-toolkit                    1.13.5-1                            amd64        NVIDIA Container toolkit
ii  nvidia-container-toolkit-base               1.13.5-1                            amd64        NVIDIA Container Toolkit Base

然后配置/etc/docker/daemon.json文件,让docker启动时自动调用nvidia驱动

cpp 复制代码
{
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []
    }
  },
  "default-runtime": "nvidia"
}

重启docker

cpp 复制代码
sudo systemctl restart docker

验证docker是否成功调用GPU

cpp 复制代码
yhp1szh@SZH-C-006RW:/$ docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:24.04-py3


=============
== PyTorch ==
=============

NVIDIA Release 24.04 (build 88113656)
PyTorch Version 2.3.0a0+6ddf5cf
Container image Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2014-2024 Facebook Inc.
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU                      (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Copyright (c) 2015      Google Inc.
Copyright (c) 2015      Yangqing Jia
Copyright (c) 2013-2016 The Caffe contributors
All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 ...

root@e09a413296bb:/workspace# nvidia-smi
Thu May  9 05:29:47 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:17:00.0 Off |                  N/A |
| 75%   62C    P2            203W /  350W |    8385MiB /  24576MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:98:00.0 Off |                  N/A |
| 43%   41C    P8             18W /  350W |      10MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
root@e09a413296bb:/workspace# nvcc -C
nvcc fatal   : Unknown option '-C'
root@e09a413296bb:/workspace# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
root@e09a413296bb:/workspace# python
Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import torchvision
>>> torch.__version__
'2.3.0a0+6ddf5cf85e.nv24.04'
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 3090'
>>> torch.backends.cudnn.version()
90100
>>> torchvision.__version__
'0.18.0a0'

然后把代码弄下来测试

因为代理问题,直接从github下载代码太难了,选择手动上传

先把代码传到服务器这个文件夹里

/mnt/workspace/xiebell/pytorch2404/

容器每次关闭,如果用 --rm,是自动删除的,这里我们不要删除

我的代码存进去后启动容器:

cpp 复制代码
docker start 286f0ced9277

docker attach 286f0ced9277

复制服务器文件到容器里:

另开一个terminal

cpp 复制代码
docker cp /mnt/workspace/xiebell/pytorch2404/cuda-samples-master.zip 286f0ced9277:/workspace/xiebell/cuda.zip

显示

cpp 复制代码
Successfully copied 146.6MB to 286f0ced9277:/workspace/xiebell/cuda.zip

此时打开容器查看就有了

解压

至此,文件在容器里已经准备好了,

cpp 复制代码
root@286f0ced9277:/workspace/xiebell/cuda-samples-master# ls
CHANGELOG.md  LICENSE   README.md  Samples_VS2017.sln  Samples_VS2022.sln
Common        Makefile  Samples    Samples_VS2019.sln  bin

下一步准备编译

编译:

cpp 复制代码
$ cd <sample_dir>
$ make

执行文件

cpp 复制代码
./HSOpticalFlow

结果

cpp 复制代码
root@286f0ced9277:/workspace/xiebell/cuda-samples-master/Samples/5_Domain_Specific/HSOpticalFlow# ./HSOpticalFlow 
HSOpticalFlow Starting...

GPU Device 0: "Ampere" with compute capability 8.6

Loading "frame10.ppm" ...
Loading "frame11.ppm" ...
Computing optical flow on CPU...
Computing optical flow on GPU...
L1 error : 0.044308

只能看懂计算误差,看不懂别的

相关推荐
joker_zsl1 小时前
docker的安装和简单使用(ubuntu环境)
运维·docker·容器
Run1.1 小时前
深入解析 Linux 中动静态库的加载机制:从原理到实践
linux·运维·服务器
VI8664956I261 小时前
全链路自动化AIGC内容工厂:构建企业级智能内容生产系统
运维·自动化·aigc
啥都想学的又啥都不会的研究生1 小时前
Kubernetes in action-初相识
java·docker·微服务·容器·kubernetes·etcd·kubelet
264玫瑰资源库3 小时前
斗鱼娱乐电玩平台源码搭建实录
运维·服务器·游戏·娱乐
赵我说的做_life3 小时前
基于Docker的Flask项目部署完整指南
docker·容器·flask
Jogging-Snail3 小时前
从零开始掌握Linux数据流:管道与重定向完全指南
linux·运维·管道·重定向·linux 数据流·管道原理
niuTaylor3 小时前
Linux驱动开发快速上手指南:从理论到实战
linux·运维·开发语言·驱动开发·c#
江畔独步4 小时前
docker容器监控&自动恢复
docker·容器·eureka
fxshy4 小时前
ai聊天流式响应,阻塞式和流式响应 nginx遇到的坑
运维·javascript·nginx