Can I use the
MultiMarginLossinstead of using the cross-entropy loss for a classification task?
Should I apply softmax function to the logits before I send the logits into the “MultiMarginLoss” to make the logits as normalized probobility?
I think typically not. Speaking of which: I would not call them “logits” for this reason but “score” or so because the former always has a “unnormalized log probabilities” ring to me, which is not the case here.
Thanks, I tried it and works. I also had another question: compared with the CrossEntropy, what is the advantages and disadvantages of using the Multi-Margin loss when doing a classification task?
Could you share your observation on the performance when using
CrossEntropy? Does it perform better / create better embeddings?
My basic take on this is:
- Using softmax + negative log likelihood (which combines to CrossEntropyLoss in PyTorch lingo) has the charming property of having NLL being interpretable thanks to the “likelihood” (and the entire training to be a maximum likelihood fitting of sorts). It has proven to work over and over again, typically with number of classes <1000. It seems to not work as well for problems with many more classes.
- Using margin losses (which to me always has a SVM association has the advantage of very directly dealing with the negative examples and I look at it as being less prone to saturation effects than softmax in this when you have many classes. Edit: Also, the margin doesn’t have this push to extreme scores (“overconfidence” if you use the outputs of softmax as probabilities).
But you should take this with a lot of caution (“random person on the internet says”) and do your own experimentation for the problem you want to solve.