I’m trying to implement Softmax regression from scratch, but I have a few problems.
The notebook can be visualized at the following link, or downloaded directly here.
I checked the individual functions and compared the results with the ones PyTorch provides, and they seem correct (i.e., they provide the same values). The problem is that when I train the model, after a few batches the loss becomes NaN. If I replace my implementations of cross-entropy and Softmax, then it seems to be working. So perhaps they are not so correct. It seems that the matrix of weights, W, is NaN already after the first batch.
I didn’t look at your code, but if you wrote your softmax and
cross-entropy functions as two separate functions you are
probably tripping over the following problem.
Softmax contains exp() and cross-entropy contains log(),
so this can happen:
large number --> exp() --> overflow NaN --> log() --> still NaN
even though, mathematically (i.e., without overflow), log (exp (large number)) = large number (no NaN).
Pytorch’s CrossEntropyLoss (for example) uses standard
techniques to combine together, in effect, the log() and the exp() to avoid the overflow.
If need to write your own loss function, you will need to use
these same techniques to avoid the NaNs. If you’re just doing
this for learning (Good for you!), you have learned that this is
one good reason to use pytorch’s built-in CrossEntropyLoss