CUDA version mismatch

harshildarji · May 22, 2020, 1:59pm

When I run my scripts on this remote GPU, it takes more than double the time than it normally takes on Google Colab.

The CUDA version of remote GPU is 9.1.x while the PyTorch I installed requires 10.2.

Is it possible this version mismatch is causing increased timing?

Is it possible to upgrade CUDA on the remote server?

Kushaj · May 22, 2020, 5:06pm

Remove pytorch from the server. Then you can use !conda install pytorch torchvision cudatoolkit=10.2 -c pytorch in jupyter to download pytorch with cuda 10.2. If you use pip then use !pip install torch torchvision.

harshildarji · May 22, 2020, 5:21pm

I already did that, still the output of cat /usr/local/cuda/version.txt shows CUDA Version 9.1.85.

Kushaj · May 22, 2020, 5:25pm

What is the output of torch.version.cuda?

harshildarji · May 22, 2020, 5:26pm

The output of torch.version.cuda is 10.2.

Kushaj · May 22, 2020, 5:27pm

So pytorch is using cuda 10.2

Kushaj · May 22, 2020, 5:28pm

Also, conda install cuda in anaconda directory where all conda packages are stored. So checking cuda version of local machine will not matter.

harshildarji · May 22, 2020, 5:30pm

Well, I agree but why isn’t it showing under the directory /usr/local/ as similar to 9.1.x?
Also, time it takes for training is also way higher compared to Google Colab.

Kushaj · May 22, 2020, 5:31pm

If you are using conda, then it is not installed in /usr/local/. It is in ~anaconda/envs/{something}.

Kushaj · May 22, 2020, 5:38pm

There may be other reasons for the slowdown. Slow CPU, worse GPU, slow storage type.