What to do for non-finite warning in `clip_grad_norm`?

I started to see this warning for a language model training

FutureWarning: Non-finite norm encountered in torch.nn.utils.clip_grad_norm_; continuing anyway. Note that the default behavior will change in a future release to error out if a non-finite total norm is encountered. At that point, setting error_if_nonfinite=false will be required to retain the old behavior.

Is this an indicator that my model is not working well? And if so, is there any recommendation on what to change? Thanks!

(I am using Adam with weight decay)

This warning should indicate that some of the calculated gradients are non-finite (Inf or NaN most likely). I would claim it depends on your use case, if these invalid gradients are expected and if clipping them should be fine or if you would like to avoid them in the first place.

However, in case of Inf, clipping by norm means that all non-inf entries will be removed (i.e., zeroed), unless PyTorch does something specifically for this case.

@ptrblck @SimonW I am using BERT/large transformers, and this happens in the middle of training. Any insights based on this? Should I increase/drcrease learning rate/max_clip_norm/warmup steps etc.?

Maybe your model diverged… try using a smaller learning rate or a lr scheduler and see if the gradients keeps diverging.

1 Like

I also have this warning when update the PyTorch to 1.9

1 Like

Hi pal, could I know if you still have this issue? Any hint to solve it? I run into this problem lately and kinda stuck here. I am thinking if it is caused by gradient vanishing and try to solve it by adding the layer normalization. I am still monitoring the process, see if it will go well.