配置基于WSL2的Docker环境并支持CUDA

2021年1月20日 2937点热度 2人点赞 1条评论

导言

正如前文windows 10 开启WSL2介绍的,我们可以在windows10中使用linux子系统。今天本文介绍如何在此基础上安装Docker并支持在wsl中使用GPU。

准备工作

  1. 加入windows insider preview。建议选Dev通道,不要选Beta。
  2. 安装Nvidia WSL2-compatibile 驱动

打开这个链接-> Get CUDA Driver-> log in -> download

  1. 管理员身份运行powershell:
dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart

dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart

wsl --set-default-version 2
  1. 更新 wsl
wsl.exe --update

如果update参数无效,没有更新wsl,则说明你没有使用预览版的windows系统,wsl的版本低。也许你从NVIDIA、Docker、Microsoft看到的文档中告诉你大于某个版本号就可以,但我建议你使用当前最新版本

If you find wsl cannot be updated, please update your windows os to the latest preview version.

安装Docker

下载

去Docker官网下载,请不要使用下面这个脚本。
Don't use the following commend. Please visit Docker offical website.

curl https://get.docker.com | sh

设置

Use the WSL2 based engine

开启你需要使用docker的wsl发行版

安装CUDA Toolkit

在wsl里,这里举例用到微软store下载的Ubuntu-18.04

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'
sudo apt-get update
sudo apt-get install -y cuda-toolkit-11-0

测试CUDA

cd /usr/local/cuda/samples/4_Finance/BlackScholes
make
./BlackScholes

如果结果看起来如同下列所示,说明是OK的。

GPU Device 0: "Turing" with compute capability 7.5

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (512 iterations)...
Options count             : 8000000
BlackScholesGPU() time    : 0.723174 msec
Effective memory bandwidth: 110.623468 GB/s
Gigaoptions per second    : 11.062347

BlackScholes, Throughput = 11.0623 GOptions/s, Time = 0.00072 s, Size = 8000000 options, NumDevsUsed = 1, Workgroup = 128

Reading back GPU results...
Checking the results...
...running CPU calculations.

Comparing the results...
L1 norm: 1.741792E-07
Max absolute error: 1.192093E-05

Shutting down...
...releasing GPU memory.
...releasing CPU memory.
Shutdown done.

[BlackScholes] - Test Summary

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

请注意,使用 nvidia-smi命令不起作用是正常的

安装 NVIDIA Container Toolkit

distribution=(. /etc/os-release;echoIDVERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2

启动docker service

sudo service docker restart

测试下docker

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 

如果结果形如下列所示,则说明是OK的

Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
MapSMtoCores for SM 7.5 is undefined.  Default to use 64 Cores/SM
GPU Device 0: "GeForce GTX 1650" with compute capability 7.5

> Compute 7.5 CUDA device: [GeForce GTX 1650]
16384 bodies, total time for 10 iterations: 25.868 ms
= 103.772 billion interactions per second
= 2075.440 single-precision GFLOP/s at 20 flops per interaction

如果你发现使用Nvidia Driver >=465.42 时,有这样的错误信息:

docker: Error response from daemon: OCI runtime create failed: 
container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: 
cuda>=11.2, please update your driver to a newer version, or use an earlier cuda container: unknown.

请更新驱动 Nvidia Driver >=470.76

Ref: Issue

一些问题QA

  • Error: only 0 Devices available, 1 requested. Exiting.
    • reboot
  • IP address of windows host
    • cat /etc/resolv.conf | grep nameserver | awk '{ print $2 }'

reference

1.https://ocdevel.com/blog/20201207-wsl2-gpu-docker
2.https://docs.nvidia.com/cuda/wsl-user-guide/index.html#installing-nvidia-docker
3.https://docs.microsoft.com/zh-cn/windows/wsl/install-win10
4.https://developer.nvidia.com/blog/announcing-cuda-on-windows-subsystem-for-linux-2/
5.https://docs.docker.com/docker-for-windows/wsl/

Dong Wang

A final year master's student in computer science at Uppsala University in Sweden. I am interested in deep learning, computer vision, and optimization. I am actively looking for Ph.D. position.

文章评论

  • 南国羽

    测试评论

    2021年1月20日