Compute and report gradient norm of a hidden variable

dima · March 16, 2017, 3:53am

I would like to compute the gradient norm wrt some layer hidden units h. What is the best way to do this? I appreciate if you point me at an example. So far, I understand how to compute a gradient of the leaves, but the h.grad attribute is None for variables in the middle. I cannot set requires_grad for them.

Thanks a lot!

EDIT: reformulated my question.

dima · March 16, 2017, 4:24am

Thank you. It works for parameters. As far as I understand, the gradient for non-leaf nodes is deallocated when it is not needed.

dima · March 16, 2017, 4:58am

Veril, unfortunately not:

    x[0].requires_grad = True
RuntimeError: you can only change requires_grad flags of leaf variables.

apaszke · March 16, 2017, 9:34am

@dima use h.register_hook(lambda grad: print(grad)). You can give it a callback that will be called when a grad is ready for it (h should be a Variable that holds that hidden state you’re interested in).

apaszke · March 16, 2017, 3:37pm

It’s not deallocated, it just hasn’t been allocated. We’ve changed the strategy for allocating gradients in the recent release.

dima · March 16, 2017, 5:39pm

All optimizers I know about require gradients wrt parameters which are always leaf nodes. So, it shouldn’t be a problem for implementing an optimizer. In my case I need those gradients only for debugging purposes.

For those are interested, I ended up doing like this:

class LazyContainer(object):
    def __call__(self, g):
        self.grad_norm = g.norm(2)
container = LazyContainer()
h.register_hook(container)
# later
print(container.grad_norm)

xylcbd · March 7, 2018, 3:35am

you can also do like this:

A = Variable(torch.ones(2), requires_grad = True)
B = A*2 
B.retain_grad()
C = B.norm()
C.backward()
print B.grad

#outputs
Variable containing:
0.7071
0.7071
[torch.FloatTensor of size 2]