Gradient clipping is not working properly

MrPositron · February 3, 2021, 5:25am

Hello!

I am using gradient clipping during training as follows:

optimizer.zero_grad()
loss = criterion(output, target)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm = 1)
optimizer.step()

However, as you can see from the wandb plots

Gradients explode, ranging from -3e5 to 3e5.

This plot shows the disribution of weights across each mini-batch. Thus, it is not an accumulated sum of all gradients.

I don’t know what is happening

ptrblck · February 3, 2021, 9:29am

Could you add a check before and after the clipping is applied, iterate all parameters, and print their max. abs. value to isolate if the clipping is indeed not working?

MrPositron · February 3, 2021, 11:58am

Thanks! Okay, I will try that

MrPositron · February 4, 2021, 3:42pm

Gradients before norm clipping:
[4.451817512512207, 2.2666594982147217, 12.370549201965332, 2.5617222785949707, 3.411081314086914, 32.17192840576172, 2.2899510860443115, 48.48966979980469, 2.751540184020996]
And here after it:
[4.451817512512207, 2.2666594982147217, 12.370549201965332, 2.5617222785949707, 3.411081314086914, 32.17192840576172, 2.2899510860443115, 48.48966979980469, 2.751540184020996]

The way I compute gradients is as following:

def get_gradients(model):
	grads = []
	for params in model.parameters():
		grads.append(torch.max(torch.abs(params)).item())
	return grads

Here is how I print them:

grads = get_gradients(model)
print(grads)
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm = 1)
grads = get_gradients(model)
print(grads)

ptrblck · February 4, 2021, 10:46pm

Thanks for the update.
Sorry, for not being clear in my previous post, but could you print the params.grad attributes?
The parameters themselves won’t be changed, but their gradients should. Also, the norm before and after would be interesting to see:

print(torch.norm(torch.cat([p.grad.view(-1) for p in model.parameters()])))

MrPositron · February 5, 2021, 4:28am

Oh. I made a mistake. Sorry

You were perfectly clear. I wanted to print gradients, i.e. grads.append(torch.max(torch.abs(params.grad)).item()). But somwehow I forgot it. Such a silly mistake.

I will fix the mistake, and will also chechk the norm. Thanks for you reply! I will reply asap.

MrPositron · February 5, 2021, 12:43pm

I checked gradients, and everythin is fine. I am sorry for taking your time. I think that W&B just logs the gradients when they are not yet clipped.