doesn’t work in my case, cause the grad can be none,
I use:
total_norm = 0
parameters = [p for p in model.parameters() if p.grad is not None and p.requires_grad]
for p in parameters:
param_norm = p.grad.detach().data.norm(2)
total_norm += param_norm.item() ** 2
total_norm = total_norm ** 0.5
return total_norm
This works, I printed out the gradnorm and then clipped it using a restrictive clipping threshold.