Backpropagation with requires_grad=False

sigma_x · May 3, 2020, 9:03am

This is just to confirm my understanding on how autograd works, as I found the solution neither here nor here.

In the following setup m1, m2 and m3 are Pytorch Sequential models, l1 is a loss function, lab are labels. Parts of m2 have requires_grad=False:

input = torch.randn(10)
o1 = m1(input)
o2 = m2(o1)
o3 = m3(o2)
l=l1(o3,lab)
l.backward()

My question is: since part (or even all) of m2 do not compute gradients, will the gradients be automatically copied from m3 to m1 automatically by backward, or I have to do it outside of it?

ptrblck · May 4, 2020, 12:49am

Autograd won’t copy the gradients, but will properly backpropagate through all models up to the first parameter, which requires gradients.
E.g. you could also freeze all models and set requires_grad=True for the input and will still get valid gradients for the input tensor.
The frozen parameters won’t get their .grad attribute populated.

sigma_x · May 4, 2020, 11:01am

Thanks. By ‘freeze’, do you mean requires_grad_(False)?

ptrblck · May 4, 2020, 7:58pm

Yes, that’s what I meant.