After upgrading from 1.1.0 to 1.3.1 I started getting some weird problems during training related to ADAM optimizer.
Essentially after a certain point, the model would stop training, as if learning rate was set to zero.
Looking into Adam optimizer state revealed that exponential moving average of the gradient square is infinite for some tensors:
tensor([[[[-1.7468e-03, -1.6368e-03, -1.2466e-03, …, -1.4461e-03, -1.6768e-03, -9.0883e-04], …
tensor([[[[inf, inf, inf, …, inf, inf, inf], [inf, inf, inf, …, inf, inf, inf], [inf, inf, inf, …, inf, inf, inf] …
I have never seen this problem in previous versions of PyTorch. Is this a bug or a numerical problem with my training procedure?
It might be related to this fix: https://github.com/pytorch/pytorch/pull/23737/files/b90bd775cf538381d7f5dc0327eb59937b2cca35#diff-c82300e6ba86ad06720f2bb4ecc658bd
EDIT: no, probably not related