Let s denote preactivation units (before ReLU, after Linear), then I can get gradients of preactivations dE/ds through register_backward_hook. But how I can get gradients of f(dE/ds) w.r.t. weights, where f is another function (e.g. norm)? It seems that grad_out in the hook does not require grad and does not have a grad_fn.

This is similar to calculating second derivative, except that I’m considering f(dE/ds) instead of f(dE/dW). I know this is somehow abnormal, but I’m trying some new ways related to Hessian matrix. Just wondering whether this is possible…

Thanks