Pytorch 1.7 slower on CUDA 11.0 than CUDA 10.2?

amirhf · October 31, 2020, 9:44pm

Hello,

I am experiencing slower distributed training with the new PyTorch 1.7 built with CUDA 11.0 compared to 10.2! Has anyone benchmarked anything yet? I use the same script in two different environments one with CUDA 11.0 and the other with CUDA 10.2. The same script that takes 21 hrs for one epoch on CUDA 10.2 takes 24 hours on CUDA 11.0.

ptrblck · November 1, 2020, 9:04am

I guess the potential slowdown is not coming from distributed training (and thus NCCL) not from CUDA11, but might be coming from e.g. cudnn (which also depends on the device you are using).

Are you only seeing the slowdown using DDP or also using a single device? The latter case would point towards my assumption.

Could you give more information about your setup (GPU, model architecture) and also profile the training on a single device?

benfei · December 15, 2020, 8:23am

I’m having similar issue with pytorch 1.7 w/ CUDA 11.0 compared to CUDA 10.1. I’m using 2080Ti as GPU.

Simple example that demonstrates this (only conv2d):

import torch
import torch.nn.functional as F

x = torch.randn(10, 64, 128, 128).cuda()
w = torch.randn(64, 64, 5, 5).cuda()
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

# warmup
y = []
for _ in range(10):
    y.append(F.conv2d(torch.randn_like(x), w))

# measure
start.record()
y = []
for _ in range(10):
    y.append(F.conv2d(torch.randn_like(x), w))
end.record()
torch.cuda.synchronize()
print('time = %.2f' % (start.elapsed_time(end),))

Results:

# pytorch 1.7 w/ cuda 10.1
# time = 21.05 +/- 0.05
# pytorch 1.7 w/ cuda 11.0
# time = 25.40 +/- 0.05

ptrblck · December 15, 2020, 8:42am

Could you update to PyTorch 1.7.1 with CUDA11.0 and cudnn8.0.5 and recheck the performance, please?

benfei · December 15, 2020, 9:26am

Thanks for the quick response!
I used 1.7.1 in my tests. Sorry for not providing patch-version in my previous comment.
I’ve uploaded my collect-env logs to a relevant github issue: https://github.com/pytorch/pytorch/issues/47908#issuecomment-745140208

jucor · March 2, 2021, 9:15pm

Interesting. Is the problem still present with Cuda 11.2? And have others been able to see the same problem?