I would like to compute the gradient norm wrt some layer hidden units h. What is the best way to do this? I appreciate if you point me at an example. So far, I understand how to compute a gradient of the leaves, but the h.grad attribute is None for variables in the middle. I cannot set requires_grad for them.
@dima use h.register_hook(lambda grad: print(grad)). You can give it a callback that will be called when a grad is ready for it (h should be a Variable that holds that hidden state you’re interested in).
All optimizers I know about require gradients wrt parameters which are always leaf nodes. So, it shouldn’t be a problem for implementing an optimizer. In my case I need those gradients only for debugging purposes.
For those are interested, I ended up doing like this:
class LazyContainer(object):
def __call__(self, g):
self.grad_norm = g.norm(2)
container = LazyContainer()
h.register_hook(container)
# later
print(container.grad_norm)