I’m trying to build a docker image that can run using GPUS, this my situation:
I have python 3.6 and I am starting from image nvidia/cuda:10.0-cudnn7-devel.
Torch does not see my GPUs.
nvidia-smi is not working to, returning error:
Failed to initialize NVML: Unknown Error
The command ‘/bin/sh -c nvidia-smi’ returned a non-zero code: 255
I installed nvidia toolkit and nvidia-smi with
RUN apt install nvidia-cuda-toolkit -y
RUN apt-get install nvidia-utils-410 -y
How did you execute the container?
Note that you would need the nvidia docker runtime to be able to use GPUs inside the container.
docker versions used:
nvidia-docker run container
while newer ones can be started via:
docker run --gpus all container
I used this command.
I solved my problem and forgot to take a look at this question, the problem was that it is not possible to check the avaiability of the GPUs during building of an image as I was doing in the image above.
Once I tried to start the image and use command torch.cuda.is_avaiable() it returned True and I figured out the problem was not the installed packages but the way I was checking if they were working.