I am trying to build Autoencoder whose encoder,decoder are nested TreeLSTM-s. During training after some iterations loss becomes ‘nan’. I did try to decrease learning rate, do gradient clapping,data normalization but still it becomes ‘nan’. What can be wrong?
RuntimeError: Function ‘MulBackward0’ returned nan values in its 0th output. This is error message.
Depending on your model or your training setup it can be few reasons.
If your model is deep enough it might suffer from vanishing gradients, if you don’t apply regularisation, especially if you’re using sigmoid/tanh as activations.
It can also be that you’re learning rate was too high and the gradients became infinity hence your model diverged.
These are typical scenarios, few tricks should help, like choosing a smaller learning rate or apply regularisation.
My training set is 3000 style and content tensors. When I make my training set 100 and then train my model only on that 100 tensors I don’t get ‘nan’ in my loss. But increasing training set I get. Can there be other problem going on?