Is it possible to access the gradient update of a specific layer during standard backprop?
I have tried setting layer.weight.retain_grad() for the layer of interest and then calling layer.weight.grad, but as I understand it this gives only the gradient of the layer parameters with respect to activations (dh_i / dtheta_i), but I am interested in getting the gradient of the loss with respect to layer activations (dL/dh_i). More specifically, if the update rule for the parameters of a given layer is:
theta_i = theta_i - (dL / dh_i) * (dh_i / dtheta_i)
then is it possible to get the values of (dL / dh_i) that are backpropagated during training?