NaN values after `optim.step()`


I’m just getting some NaN in some module parameters (word embedding weights).
I just indentified that the NaN comes with the optim.step() instruction.

What could typically leads to this? (sharing code would be unpractical as there is quite a lot of things, hard to get an atomic reproducible example :/)

Interestingly enough, this problem only appears on some data (I’ve some toy data to test based on the PTB dataset with target = source; target = reversed source or a dataset with input=random integer sequence, target=sorted sequence)


my model is made of 2-brnn lstm encoder with shared embedding, 2-lstm decoder with temporal attention over source and intra-decoder attention (as described in Paulus et al, (2017)).

The problem was very task specific.

For some reason I was not using part of a parameter (but only a slice of the matrix), which, I suppose, made it a NaN

I have the same issue as you.