Could you add a check before and after the clipping is applied, iterate all parameters, and print their max. abs. value to isolate if the clipping is indeed not working?
Thanks for the update.
Sorry, for not being clear in my previous post, but could you print the params.grad attributes?
The parameters themselves won’t be changed, but their gradients should. Also, the norm before and after would be interesting to see:
print(torch.norm(torch.cat([p.grad.view(-1) for p in model.parameters()])))
You were perfectly clear. I wanted to print gradients, i.e. grads.append(torch.max(torch.abs(params.grad)).item()). But somwehow I forgot it. Such a silly mistake.
I will fix the mistake, and will also chechk the norm. Thanks for you reply! I will reply asap.