Proper way to do gradient clipping?

No, loss.backward() calculates the gradient, clip_grad_norm_ limits it’s norm and optimizer.step() updates the parameters. But yes, you need the first and last.

Best regards

Thomas

10 Likes

Does Variable.grad.data gives access to normalized gradients per batch? If yes, how can I have access to unnormalized gradients?

1 Like

normally when the model loss step to convergence, the grad_norm step to convergence. So is there any strategy to config both lr_scheduler and gard_norm_threshold_scheduler, so that the model get into a ideal convergence state? [both model loss and grad_norm convergenced at nearly the same point of step]