I’m training a TCN model for time series prediction, the structure is as follows:
But as the training progressed, I suddenly got the following error report:
‘WeightNormInterfaceBackward0’ returned nan values in its 0th output.
I carefully checked the parameters of the model and found that some of them were particularly strange, the values of the parameters were particularly small （1e-16, 1e-17），and the corresponding gradients were almost 0. After backward, they became nan
I have tried to lowe the learning rate, but this time it just took a little longer to train and I still get the same reported error! And I dont’t know why this happen…
Can anyone who also met similar problems help me?
What’s the problem and How can I correct it?