PyTorch with CUDA 9.1 is super slow compared to 9.0

I was just upgrading my PyTorch install to the CUDA 9.1 one that was recently released and noticed that training’s become super slow. Approx. 100 times slower than prior to the upgrade.

For the upgrade, I performed a conda uninstall pytorch and then installed the cuda 9.1 version via conda install pytorch torchvision cuda91 -c pytorch.

I then just went ahead and did the reverse, conda uninstall pytorch and then conda install pytorch torchvision cuda90 -c pytorch. The speed is back to “normal” now.

In all my scripts, I am including a line

print(torch.cuda.is_available())

which evaluated to true in all of these cases. However, since the speed via the CUDA 9.1 install matched approx the speed I get when I run the code locally on a non-GPU laptop, I am assuming that this install was not using the GPU properly. Could it be that the card is incompatible to CUDA 9.1? Is there a way to check that other than torch.cuda.is_available()?

What is the output of print(torch.version.cuda) for your cuda91 installation?
Also, could you see any activity on your GPU in nvidia-smi?
What GPU do you use?

It shows the installed version correctly, i.e., ‘9.1.85’, and yeah, via nvidia-smi, I can’t detect any GPU use on that device. The card I am using is a Nvidia Tesla K80. I should maybe try to install PyTorch from source. I was just curious if there’s a way to determine whether the CUDA drivers shipped with the binary PyTorch installers are compatible with the graphics card/ graphics card drivers

We already do such a check here:

Ah, thanks @smth ! I must say it’s a very weird case, I never got a warning when installing/importing it. And despite the fact that it detects the CUDA 9.1 lib, it does run on CPU (0% utilization when I check via nvidia-smi). When I install the CUDA 8.0 PyTorch version (also via conda), it works just fine though with 100% utilization.

@rasbt Do you know that you still can use CUDA 9.1 for your Tesla K80?

Everything you need - compile PyTorch from source for compatibility version of your GPU.

If you need more information about it, I can share the instructions that made my old GTX 760 live again with CUDA 9.1 with you - just drop me a line. It will be an honor for me to help you!

@smth Why did you stop to support GPUs with CUDA compatibility 3.0? I fixed it by adding 3.0 to this environment variable:

@smth Or this fix is very naive? I don’t know. Could you clarify it, please?

At least linear regression seems to work perfectly (except warning about my old GTX 760).

@GarrisonD we wanted to start using some CUDA instructions that aren’t available in 3.0 and below. What you are doing is correct, and it will probably work for now, until we introduce some cuda routines that use features available only in 3.5 and above.

Ohh, now everything is clear! Thank you!

What is the best solution for me? Stay with CUDA 8?

Thanks! I became too busy to fiddle with it more and have been sticking to CUDA 8.0 on that machine since then. I believe it was due to the drivers for the k80’s but I should give it another try! Maybe your notes would come in handy :slight_smile: