I was wondering if it were at all possible to use
register_full_backward_hook for the gradient of an intermediate layer? What I mean by this is, let’s say we have code which goes like…
X = torch.randn(B, N) #get some inputs Y = model(X) #calculate my output, Y (a R^N -> R^1 function) loss = calc_loss(target, Y) #calculate my loss loss.backward() #calc. gradients of loss
Now if I were to use
register_full_backward_hook that places a backward hook on the layers of my
model. Let’s say I have a feed-forward model which composes purely of
torch.nn.Linear modules. The hooks record the gradient of loss w.r.t to the output of the linear layers, so d(loss)/ds (where s is the output of a given linear layer within the network).
I was wondering if it’s at all possible to get
register_full_backward_hook to record the gradient of the model output (rather than the loss) with respect to the output of the linear layers, so dY/ds ? Is this at all possible with backward hooks or is it purely restrict to the gradient from the loss?