Hi,
I’m just getting some NaN in some module parameters (word embedding weights).
I just indentified that the NaN comes with the optim.step()
instruction.
What could typically leads to this? (sharing code would be unpractical as there is quite a lot of things, hard to get an atomic reproducible example :/)
Interestingly enough, this problem only appears on some data (I’ve some toy data to test based on the PTB dataset with target = source; target = reversed source or a dataset with input=random integer sequence, target=sorted sequence)
Thx
my model is made of 2-brnn lstm encoder with shared embedding, 2-lstm decoder with temporal attention over source and intra-decoder attention (as described in Paulus et al, (2017)).