Autograd.grad extremely slow

Does anyone have an intuition why backwards applied to autograd.grad would be several hundred times slower than the forward pass? This is with Pytorch 1.4 and cuda toolkit 10.1.243.

Hi,

That would depend a lot on your forward pass. It should not do that :smiley:
Do you have a code sample that repro this? If possible on colab (https://colab.research.google.com/notebook#create=true) so that we can easily test it.

And here’s a graph of the full network:

I tracked the problem to cudnn.benchmark. Only the first iteration takes very long, but if you wait for that to pass, the subsequent iterations take the correct amount of time.

Ho, right. This is expected then. The cudnn.benchmark takes a very long time the first time it encounters a new input size (as it tests several algorithms).
If your models has many different convolutions with different input sizes, it will do that many times, which can be quite slow indeed.

Happy that you found the reason !