This is just to confirm my understanding on how autograd works, as I found the solution neither here nor here.
In the following setup m1, m2 and m3
are Pytorch Sequential models, l1
is a loss function, lab
are labels. Parts of m2 have requires_grad=False
:
input = torch.randn(10)
o1 = m1(input)
o2 = m2(o1)
o3 = m3(o2)
l=l1(o3,lab)
l.backward()
My question is: since part (or even all) of m2
do not compute gradients, will the gradients be automatically copied from m3 to m1 automatically by backward
, or I have to do it outside of it?