Rtx 3070 slower with mixed precision autocast in pytorch 1.7

Hi. I’m using pytorch 1.7 and getting a very slow training speed with my new rtx 3070 whenever I enable torch.cuda.amp.autocast (2x slower).

From what I’ve read, this problem should be solved in 1.7.1, because it incorporates the newer cudnn versions, right? Are there any hopes for it’s stable version being available until the end of this month?

Thank you for your attention.

cudnn8.0.5 ships with the updated heuristics for the 3090 and cudnn8.1.x will cover the complete 30xx series. You could try out the nightly PyTorch build, which already uses cudnn8.0.5 and check, if the performance is improved.

1 Like

Well, I’ve tried the nightly build with cudnn 8.0.5 and now it’s a lot faster without amp, but still something like 10% slower with amp.

I suppose amp will only get full support when cudnn 8.1.x is available, right? Is there any expected month for it’s release?

I thank you very much for your time and help.

Yes, for the 3070 cudnn8.1.x should contain the trained heuristics. I cannot give you a specific data, unfortunately. Could you share your model and input shapes so that I could check the perf. with an internal version?

My code involves a GAN with multiple generators plus a generator classifier, and it’s pretty large, with lots of different classes and lots of argument options. I could still make a simplified version just to reproduce something like the experiment I described here, but I don’t know how condensed you are expecting it to be. I’m knew to this forum, am I supposed to just post it here in the comments?

I am observing the same phenomenon running a Resnet50 network inference on a RTX3070:
without autocast: 11ms
with autocast: 16ms.
Interestingly the GPU power consumption and CPU utilization are both lower with autocast but the inference is just significantly slower on Pytorch 1.7.1.