Proper way to do gradient clipping?

tom · November 11, 2018, 8:46am

No, loss.backward() calculates the gradient, clip_grad_norm_ limits it’s norm and optimizer.step() updates the parameters. But yes, you need the first and last.

Best regards

Thomas

amirsina_torfi · March 19, 2020, 8:36pm

Does Variable.grad.data gives access to normalized gradients per batch? If yes, how can I have access to unnormalized gradients?

JohnHerry · December 13, 2024, 9:05am

normally when the model loss step to convergence, the grad_norm step to convergence. So is there any strategy to config both lr_scheduler and gard_norm_threshold_scheduler, so that the model get into a ideal convergence state? [both model loss and grad_norm convergenced at nearly the same point of step]