Docker torch.cuda.is_avaiable returns false and nvidia-smi is not working

aslu98 · August 9, 2020, 10:16am

I’m trying to build a docker image that can run using GPUS, this my situation:

I have python 3.6 and I am starting from image nvidia/cuda:10.0-cudnn7-devel.
Torch does not see my GPUs.

nvidia-smi is not working to, returning error:

Failed to initialize NVML: Unknown Error
The command ‘/bin/sh -c nvidia-smi’ returned a non-zero code: 255

I installed nvidia toolkit and nvidia-smi with

RUN apt install nvidia-cuda-toolkit -y
RUN apt-get install nvidia-utils-410 -y

ptrblck · August 10, 2020, 10:01am

How did you execute the container?
Note that you would need the nvidia docker runtime to be able to use GPUs inside the container.
Older docker versions used:

nvidia-docker run container

while newer ones can be started via:

docker run --gpus all container

aslu98 · August 18, 2020, 9:53am

I used this command.

I solved my problem and forgot to take a look at this question, the problem was that it is not possible to check the avaiability of the GPUs during building of an image as I was doing in the image above.
Once I tried to start the image and use command torch.cuda.is_avaiable() it returned True and I figured out the problem was not the installed packages but the way I was checking if they were working.