Clip_grad_norm_() returns nan

Maks_Botlhale · September 15, 2020, 1:01pm

Hi everyone

I’m training a model using torch and the clip_grad_norm_ function is returning a tensor with nan:
tensor(nan, device=‘cuda:0’)
Is there any specific reason why this would happen? Thanks for the help.

albanD · September 15, 2020, 2:04pm

Hi,

This will might happen if the norm of your Tensors is 0? Or if it has a single element?

Maks_Botlhale · September 21, 2020, 1:19pm

Please excuse my late response. So the the tensor more than one element but I did notice that the elements in the tensor are very close to zero. Could this also cause the norm to be nan? And how would I get around this? Thanks

BramVanroy · September 21, 2020, 1:53pm

@albanD can correct me if I’m wrong but clip_grad_norm_ is an in-place operation and doesn’t return anything (None) which might be implicitly cast to nan. So use it like this (and do not assign it to anything):

clip_grad_norm_(model.parameters(), 1.0)

albanD · September 21, 2020, 3:48pm

I’m not sure, from the doc it does modify the weights inplace but also returns the total norm.

@Maks_Botlhale Which norm are you using? This is most likely due to the content of your weights yes

Maks_Botlhale · September 22, 2020, 3:55pm

Hi

I’m using norm_type=2. Yes, the clip_grad_norm_(model.parameters(), 1.0) function does return the total_norm and it’s this total norm that’s nan.

albanD · September 22, 2020, 4:01pm

Is any element in any parameter nan (or inf) by any chance? You can use p.isinf().any() to check.

Maks_Botlhale · September 22, 2020, 4:53pm

I just checked for that, none of the elements in parameters are infinite. See the screenshot below. I tried decreasing the learning rate and that didn’t help; some people suggested changing the dropout rate, that also didn’t help. I also noticed that the validation loss is also nan.

albanD · September 22, 2020, 5:40pm

This is surprising…
The clip_grad_norm_ function is pretty simple and is there: https://github.com/pytorch/pytorch/blob/1c6ace87d127f45502e491b6a15886ab66975a92/torch/nn/utils/clip_grad.py#L25-L41
Can you try to copy paste that in your code and check it gives nan as well? Then you can add some prints there to see when the nan appears

Maks_Botlhale · September 22, 2020, 6:21pm

I copied and pasted that as suggested, and I am still getting nan values when it’s calculating the total norm. Line 36 of the code I copied calculates the total norm as:
total_norm = torch.norm(torch.stack([torch.norm(p.grad.detach(), norm_type).to(device) for p in parameters]), norm_type)

I did the p.grad.detach() function on a separate line and I noticed that’s where the nan values start popping up.

albanD · September 22, 2020, 7:39pm

Ho right (sorry I missed that…). It computes the grad norm, not the Tensors norm!
You need to check if the gradients of the parameters contain nans: p.grad.isinf().any()

Maks_Botlhale · September 22, 2020, 7:56pm

Yes, that function also returns False. See the screenshot below.

albanD · September 22, 2020, 8:30pm

Well if you said in your comment above that p.grad.detach() has nan, then the grad must have nans already.

Yikai_Kang · November 26, 2020, 12:03am

You can try this quite simple example, maybe you can find a solution:

import torch
x = torch.tensor([1., 2.])
x.grad = torch.tensor([0.4, float("inf")])
torch.nn.utils.clip_grad_norm_(x, 5)
print(x.grad)

External_happy · January 29, 2024, 8:11pm

Any solutions to this problem? I have nan values after a few iterations.