Gradient clipping is not working properly

Hello!

I am using gradient clipping during training as follows:

optimizer.zero_grad()
loss = criterion(output, target)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm = 1)
optimizer.step()

However, as you can see from the wandb plots


Gradients explode, ranging from -3e5 to 3e5.

This plot shows the disribution of weights across each mini-batch. Thus, it is not an accumulated sum of all gradients.

I don’t know what is happening :frowning:

Could you add a check before and after the clipping is applied, iterate all parameters, and print their max. abs. value to isolate if the clipping is indeed not working?

Thanks! Okay, I will try that

Gradients before norm clipping:
[4.451817512512207, 2.2666594982147217, 12.370549201965332, 2.5617222785949707, 3.411081314086914, 32.17192840576172, 2.2899510860443115, 48.48966979980469, 2.751540184020996]
And here after it:
[4.451817512512207, 2.2666594982147217, 12.370549201965332, 2.5617222785949707, 3.411081314086914, 32.17192840576172, 2.2899510860443115, 48.48966979980469, 2.751540184020996]

The way I compute gradients is as following:

def get_gradients(model):
	grads = []
	for params in model.parameters():
		grads.append(torch.max(torch.abs(params)).item())
	return grads

Here is how I print them:

grads = get_gradients(model)
print(grads)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm = 1)
grads = get_gradients(model)
print(grads)

Thanks for the update.
Sorry, for not being clear in my previous post, but could you print the params.grad attributes?
The parameters themselves won’t be changed, but their gradients should. Also, the norm before and after would be interesting to see:

print(torch.norm(torch.cat([p.grad.view(-1) for p in model.parameters()])))

Oh. I made a mistake. Sorry :frowning:

You were perfectly clear. I wanted to print gradients, i.e. grads.append(torch.max(torch.abs(params.grad)).item()). But somwehow I forgot it. Such a silly mistake.

I will fix the mistake, and will also chechk the norm. Thanks for you reply! I will reply asap.

I checked gradients, and everythin is fine. I am sorry for taking your time. I think that W&B just logs the gradients when they are not yet clipped.