A100 is slower than 1080ti with pytorch

First, I found out that the A100 supports:

  • cuda 11.0
  • nvidia driver 450

So, when using the A100, I adjusted the environment as follows.

  • cuda 11.0
  • cudnn 8005 (I checked by torch.backends.cudnn.version())
  • nvidia driver 450 (more detailed, 450.119.04)
  • pytorch 1.8.0

I installed pytorch using,
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

However, the time for training my model takes too long.
For example, the model used in the experiment is a complex model with a size of 7.2M.
Using the 1080ti took 1600 seconds for 1 epoch (18,480 data), but it takes 2000 seconds with the A100.

I doubt this is a cudnn version problem, and I want to change cudnn to 8.0.2 or 8.0.3.
How can I change only the cudnn version in pytorch?

Or is there any other solution to this problem?
I’ve already tried torch.backends.cudnn.benchmark = True to solve this problem, but it was twice as slow.

I think it is indeed a cudnn problem, since you would be running into this issue in the 1.8.0 release (so you should update to 1.9.0 or the nightly release) and also might want to build PyTorch from source using the latest cudnn release (8.2.1).

Thank you for the reply!

Can I run what you suggested using this code? :

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia

If so, I did it the first time. However, the A100 speed was slow, so I lowered the pytorch version to 1.8.0 and cuda version to 11.0.
Should I try again?

Your command should install the latest 1.9.0 release, which would ship with the missing cutlass kernels in cudnn8.0.5. However, you might still get a better performance by building from source with cudnn8.2 as said before.

1 Like