Proper way to do gradient clipping?

Note that clip_grad_norm_ modifies the gradient after the entire backpropagation has taken place. In the RNN context it is common to restrict the gradient that is being backpropagated during the calculation. This is described e.g. in Alex Graves’ famous RNN paper.
To do the latter, you typically use register_hook on the inputs or outputs of certain operations, e.g. with lambda x: x.clamp(-10,10) to do element-wise clipping.
For a practical example, you could search for register_hook in my Graves handwriting generation notebook.

Best regards

Thomas

22 Likes