Loss is nan for CrossEntropyLoss

Very new to this. Trying to get this to work. Pretty sure I am making an easy mistake, I just can’t find it. Have spent hours today trying to figure it out. So now I am humbly asking for help.

Data: 250 Signals consisting of 30,000 samples (250,30000)
Locations: 10 different locations
NN: 2 Linear Layers with the first going through RELU.

As far as I know all of my dimensions are correct.

Link to Notebook: https://drive.google.com/open?id=12dOdHRdJR87pVUOQ8h9_XtM4AucnHy_W

Any help would be amazing.

It seems that you have randomized the inputs and the labels. The model is getting confused which is expected behaviour.

This is just dumb made up data. Was hoping to not get a good model, but one that would run. Is this a bad assumption?

Your loss is exploding. reduce your learning rate

1 Like

Contrary to my initial assumption, you should try reducing the learning rate. Loss should not be as high as Nan. Having said that, you are mapping non-onto functions as both the inputs and outputs are randomized. There is a high chance that you should not be able to learn anything even if you reduce the learning rate.

Thanks so much for the input. I will give it a try and let you know!

OK. I have changed the dummy data.

  • more examples, know 25000 instead of 250
  • calculate the locations first (this is random ints between 0-9)
  • calculate the signals with a different STD_DEV for each location
    There now should be a pattern between locations and signal since they have different distributions.

I am also running the entire training sequence through multiple times using different learning_rates. 1e-4 is the first one that doesn’t make the number explode or decrease (also seems to be the best).

Am I on the right track here?
Am I calculating the loss correctly?
When do I know that the model is good? What does the loss value mean? Do I need to calculate accuracy to know if the model is good?

Now that you localized your distribution for a particular class, the training should be better. It is advisable for the loss to decrease every subsequent epoch (at least for the first few runs). Set a learning rate for which this would happen.

  • It seems to me that you are computing the loss correctly. The network is a 2 layer MLP. Try playing around with the model architecture, a little more. Add in more layers, preferably Conv1d layers
  • As for knowing when to stop training, for a particular set of hyperparameters, the training loss would have converged (remains stagnant).
  • Ideally you should test your trained model on a validation set and choose the model that shows best accuracy on the validation set.

All the best

Really appreciate the help. Thank you!