Proper way to do gradient clipping?

No, loss.backward() calculates the gradient, clip_grad_norm_ limits it’s norm and optimizer.step() updates the parameters. But yes, you need the first and last.

Best regards

Thomas

10 Likes

Does Variable.grad.data gives access to normalized gradients per batch? If yes, how can I have access to unnormalized gradients?

1 Like