This is just to confirm my understanding on how autograd works, as I found the solution neither here nor here.
In the following setup m1, m2 and m3 are Pytorch Sequential models, l1 is a loss function, lab are labels. Parts of m2 have requires_grad=False:
input = torch.randn(10)
o1 = m1(input)
o2 = m2(o1)
o3 = m3(o2)
l=l1(o3,lab)
l.backward()
My question is: since part (or even all) of m2 do not compute gradients, will the gradients be automatically copied from m3 to m1 automatically by backward, or I have to do it outside of it?
