I’m a PyTorch beginner, and I’m curious how to mimic this functionality in PyTorch. My current approach defines a function and registers it with
def normalize_grad(module, grad_input, grad_output):
# both grad input and grad output are of size one, but we'll do it in general anyway
in_tup = 
for gi, go in zip(grad_input, grad_output):
gin = gi.div(torch.norm(gi, p=1) + 1e-8).mul(module.strength).add(go)
I’m not sure if this is correct, because it is very dramatically changing my results.
You almost surely don’t want to use the module hook but add
mysomething.register_hook(...) on the variables.
For example I use (from the top of my head)
mysomething.register_hook(lambda x: x.clamp(min=-10, max=10)) for recreating Graves’ handwriting RNN.
So what you’re suggesting is, given a class attribute
loss that is the loss computed in a module, I should call something like
loss.register_hook(lambda g: g.div(torch.norm(g, p=1) + 1e-8))?
I’m not sure whether you want to do this with loss or with some intermediate quantity inside your module, but yes, that looks about what I’d do. (Note that you’re not confined to a lambda, you can also define a function if you prefer.