Loss is nan for CrossEntropyLoss

thingsofleon · March 29, 2020, 2:12am

Very new to this. Trying to get this to work. Pretty sure I am making an easy mistake, I just can’t find it. Have spent hours today trying to figure it out. So now I am humbly asking for help.

Data: 250 Signals consisting of 30,000 samples (250,30000)
Locations: 10 different locations
NN: 2 Linear Layers with the first going through RELU.

As far as I know all of my dimensions are correct.

Link to Notebook: https://drive.google.com/open?id=12dOdHRdJR87pVUOQ8h9_XtM4AucnHy_W

Any help would be amazing.

charan_Vjy · March 29, 2020, 4:15am

It seems that you have randomized the inputs and the labels. The model is getting confused which is expected behaviour.

thingsofleon · March 29, 2020, 2:44pm

This is just dumb made up data. Was hoping to not get a good model, but one that would run. Is this a bad assumption?

dhecloud · March 29, 2020, 2:50pm

Your loss is exploding. reduce your learning rate

charan_Vjy · March 29, 2020, 3:02pm

Contrary to my initial assumption, you should try reducing the learning rate. Loss should not be as high as Nan. Having said that, you are mapping non-onto functions as both the inputs and outputs are randomized. There is a high chance that you should not be able to learn anything even if you reduce the learning rate.

thingsofleon · March 29, 2020, 8:22pm

Thanks so much for the input. I will give it a try and let you know!

thingsofleon · March 29, 2020, 8:58pm

OK. I have changed the dummy data.

more examples, know 25000 instead of 250
calculate the locations first (this is random ints between 0-9)
calculate the signals with a different STD_DEV for each location
There now should be a pattern between locations and signal since they have different distributions.

I am also running the entire training sequence through multiple times using different learning_rates. 1e-4 is the first one that doesn’t make the number explode or decrease (also seems to be the best).

Am I on the right track here?
Am I calculating the loss correctly?
When do I know that the model is good? What does the loss value mean? Do I need to calculate accuracy to know if the model is good?

charan_Vjy · March 30, 2020, 4:45am

Now that you localized your distribution for a particular class, the training should be better. It is advisable for the loss to decrease every subsequent epoch (at least for the first few runs). Set a learning rate for which this would happen.

It seems to me that you are computing the loss correctly. The network is a 2 layer MLP. Try playing around with the model architecture, a little more. Add in more layers, preferably Conv1d layers
As for knowing when to stop training, for a particular set of hyperparameters, the training loss would have converged (remains stagnant).
Ideally you should test your trained model on a validation set and choose the model that shows best accuracy on the validation set.

All the best

thingsofleon · March 30, 2020, 3:03pm

Really appreciate the help. Thank you!