Cuda available False but torch.version.cuda 10.1

Chenchao_Zhao · January 6, 2020, 10:57pm

I installed pytorch 1.3 from conda inside a ubuntu container

Then I run the docker with host machine cuda=10.1 driver=418

>>> torch.version.cuda
'10.1.243'

>>> torch.cuda.is_available()
False

torch.backends.cudnn.enabled = True

ptrblck · January 7, 2020, 5:34am

Could you post the log of the installation, please?

Chenchao_Zhao · January 7, 2020, 4:33pm

I installed it using official conda install

conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

the install was pretty normal (I will try to get the installation log but maybe it’s not very informative).
I also tried official pytorch/pytorch:latest docker image but still torch could find cuda version but cuda available was false.

The host machine I’m using doesn’t have nvidia-docker or support nvidia runtime. The gpu devices and libcuda* were manually mounted (if i ran nvidia-smi the docker was able to find gpus, driver version, and cuda version).

Are there ENV variables that pytorch search for in order to bind to cuda devices?

ptrblck · January 7, 2020, 7:20pm

Are you using plain docker without the nvidia-runtime then?
Would it be possible to install nvidia-docker on this machine?
I’m no expert in using docker, but I wouldn’t recommend trying to mount the GPUs somehow manually.

Chenchao_Zhao · January 7, 2020, 9:24pm

Hi ptrblck,

Thanks!
I don’t have the permission to customize the host machine unfortunately.
I also ran some other check

torch.cuda.is_driver_compatible() = True

I can see the nvidia0, nvidia1 … in /dev directory. somehow pytorch cannot find them

TLi347 · March 30, 2021, 5:17am

I encounter the same problem. Have you solve this yet? thanks.

Chenchao_Zhao · March 30, 2021, 3:27pm

The problem was solved by upgrading the infrastructure