Nan loss for each epoch

rim_zayani · December 27, 2019, 8:35am

Hello,

I’m training a model composed of two fully connected layers with relus.
But I get a loss value of nan for each epoch (I’m using MSE loss).
I printed the weights and bias and I found some nan values.
Do you have any idea how can I resolve this problem?
Here is some examples of input signal I’m using:
sig1 sig2
Thanks in advance.

ptrblck · December 27, 2019, 9:05am

Could you check the input tensors for NaN or Inf values?
These values would create a NaN loss.
Also, how high is your learning rate?
Do you see the NaN loss right in the first batch?

rim_zayani · December 27, 2019, 9:15am

I checked, the input tensors don’t have nan or inf values. It is composed of sparse signals.
I tried a learning rate between 1e-3 and 1e-5, but I get the same results.
Yes the nan loss begins from the first batch.
I’m using a training dataset of 5000 signals.
I noticed that when I decrease the number of signals to 50, I don’t get nan losses.

ptrblck · December 27, 2019, 8:40pm

Are you normalizing the input data or what value range are you currently dealing with? Based on your information you might create an overflow somewhere.

rim_zayani · December 29, 2019, 10:00pm

No i’m not normalizing the input data because it’s already between 0 and 1.