Hi All,
I was wondering if it were at all possible to use register_full_backward_hook
for the gradient of an intermediate layer? What I mean by this is, let’s say we have code which goes like…
X = torch.randn(B, N) #get some inputs
Y = model(X) #calculate my output, Y (a R^N -> R^1 function)
loss = calc_loss(target, Y) #calculate my loss
loss.backward() #calc. gradients of loss
Now if I were to use register_full_backward_hook
that places a backward hook on the layers of my model
. Let’s say I have a feed-forward model which composes purely of torch.nn.Linear
modules. The hooks record the gradient of loss w.r.t to the output of the linear layers, so d(loss)/ds (where s is the output of a given linear layer within the network).
I was wondering if it’s at all possible to get register_full_backward_hook
to record the gradient of the model output (rather than the loss) with respect to the output of the linear layers, so dY/ds ? Is this at all possible with backward hooks or is it purely restrict to the gradient from the loss?
Thank you!