Manipulating Gradients

Wulfsta · June 10, 2018, 9:50pm

I’m a PyTorch beginner, and I’m curious how to mimic this functionality in PyTorch. My current approach defines a function and registers it with register_backward_hook:

def normalize_grad(module, grad_input, grad_output):
    # both grad input and grad output are of size one, but we'll do it in general anyway
    in_tup = []
    for gi, go in zip(grad_input, grad_output):
        gin = gi.div(torch.norm(gi, p=1) + 1e-8).mul(module.strength).add(go)
        in_tup.append(gin)
    return tuple(in_tup)

I’m not sure if this is correct, because it is very dramatically changing my results.

tom · June 11, 2018, 12:15pm

You almost surely don’t want to use the module hook but add mysomething.register_hook(...) on the variables.

For example I use (from the top of my head) mysomething.register_hook(lambda x: x.clamp(min=-10, max=10)) for recreating Graves’ handwriting RNN.

Best regards

Thomas

Wulfsta · June 11, 2018, 7:36pm

So what you’re suggesting is, given a class attribute loss that is the loss computed in a module, I should call something like loss.register_hook(lambda g: g.div(torch.norm(g, p=1) + 1e-8))?

tom · June 12, 2018, 6:05am

I’m not sure whether you want to do this with loss or with some intermediate quantity inside your module, but yes, that looks about what I’d do. (Note that you’re not confined to a lambda, you can also define a function if you prefer.