Why not removing a register_hook() slows down the training gradually?

I had to use a register_hook() on a parameter in a custom loss function to modify gradients when backpropagating. However I did not use hook.remove() at first and I noticed training slows down as the model iterates through batches and updates (backprop updates).
My question: what is the mechanism of a register_hook()? What does it pile up and where does it pile up? How does hook.remove() fixes this and what does it exactly free up?

If you register the hook on your parameter directly, because the parameter does not change, you should register it only once for the whole training. Otherwise you add a new hook everytime and you will have many hooks.

1 Like

That makes sense. Thanks for the explanation!