I have this simple network.
which is created from these lines:
x = torch.randn(N, D_in, device=device) y = torch.randn(N, D_out, device=device) w1 = torch.randn(D_in, H, device=device, requires_grad=True) w2 = torch.randn(H, D_out, device=device, requires_grad=True) y_pred = x.mm(w1).clamp(min=0).mm(w2)
Using autograd and chain-rule, the gradients are generated from root to the leaves (here I have w1 and w2 whose requires_grad == true).
So my question is what about the intermediate “MmBackward” and “ClampBackward”. Shouldn’t we somehow store the gradients somewhere there to be used when calculating gradient of w1? If yes how can I access them?
I tried to look at “Functions.h” in generated folder but I believe grad is not an attribute of functions.