What are hooks used for?

JWageM · March 16, 2019, 9:58am

Tensors have a function: register_hook.
register_hook ( hook )[SOURCE]

Registers a backward hook.

The description says that everytime a gradient with respect to the tensor is computed the hook will be called.

My question: what are hooks used for?

Kind regards,
Jens

ptrblck · March 16, 2019, 12:23pm

You could pass a function as the hook to register_hook, which will be called every time the gradient is calculated.
This might be useful for debugging purposes, e.g. just printing the gradient or its statistics, or you could of course manipulate the gradient in a custom way, e.g. normalizing it somehow etc.

JWageM · March 16, 2019, 12:24pm

Is it only used for reporting by the user, or is it also used internally by the back-propagation algorithm?

ptrblck · March 16, 2019, 12:26pm

If you manipulate the gradients, the optimizer will use these new custom gradients to update the parameters, so the latter would be true.

JWageM · March 16, 2019, 12:28pm

I see thanks.

Are they also used if I call backward() on a tensor?

ptrblck · March 16, 2019, 12:34pm

Yes, here is a small example:

x = torch.randn(1, 1)
w = torch.randn(1, 1, requires_grad=True)
w.register_hook(lambda x: print(x))
y = torch.randn(1, 1)

out = x * w
loss = (out - y)**2
loss.register_hook(lambda x: print(x))
loss.mean().backward(gradient=torch.tensor([0.1]))  # prints the gradient in w and loss

Umair_Javaid · March 21, 2020, 12:07am

Can we pass any parameters of our own to this “hook” function? i mean multiple parameters

ptrblck · March 21, 2020, 12:10am

Yes, you could pass more parameters to the lambda call:

# same script as above
my_param = "lala"
loss.register_hook(lambda x, my_param=my_param: print(my_param, x))
loss.mean().backward(gradient=torch.tensor(0.1))

pratyush911 · April 14, 2020, 5:41am

@ptrblck I was wondering if it is possible to set requires_grad = True for the registered hooks.
More specifically, I am registering hooks for a recurrent network and want to know the gradients of the gradients (second derivative) i.e. Second derivate of the loss w.r.t. each hidden state.
I could not find a method to do this directly.