Is it possible for register_full_backward_hook() to be use for the gradient of an intermediate layer?

Hi All,

I was wondering if it were at all possible to use register_full_backward_hook for the gradient of an intermediate layer? What I mean by this is, let’s say we have code which goes like…

X = torch.randn(B, N)  #get some inputs
Y = model(X) #calculate my output, Y (a R^N -> R^1 function)
loss = calc_loss(target, Y) #calculate my loss
loss.backward() #calc. gradients of loss

Now if I were to use register_full_backward_hook that places a backward hook on the layers of my model. Let’s say I have a feed-forward model which composes purely of torch.nn.Linear modules. The hooks record the gradient of loss w.r.t to the output of the linear layers, so d(loss)/ds (where s is the output of a given linear layer within the network).

I was wondering if it’s at all possible to get register_full_backward_hook to record the gradient of the model output (rather than the loss) with respect to the output of the linear layers, so dY/ds ? Is this at all possible with backward hooks or is it purely restrict to the gradient from the loss?

Thank you!