I think, this touches upon the concept of leaf variables and intermediate variables.
As far as I could see, in all three cases, w
is an intermediate variable and the gradients will be accumulated in torch.randn(..., requires_grad=True)
(which is one of the roots of the computation tree) instance. All the intermediate variables’ gradient (including w
) is removed during the backward()
call. If you want to retain those gradients, call w.retain_grad()
before calling backward()
.
19 Likes