Grad is None even when requires_grad=True

I think, this touches upon the concept of leaf variables and intermediate variables.
As far as I could see, in all three cases, w is an intermediate variable and the gradients will be accumulated in torch.randn(..., requires_grad=True) (which is one of the roots of the computation tree) instance. All the intermediate variables’ gradient (including w) is removed during the backward() call. If you want to retain those gradients, call w.retain_grad() before calling backward().

18 Likes