When and how shall we use the "torch.nn.MultiMarginLoss"?"

  1. Can I use the torch.nn. MultiMarginLoss instead of using the cross-entropy loss for a classification task?

  2. Should I apply softmax function to the logits before I send the logits into the “MultiMarginLoss” to make the logits as normalized probobility?

Yes.

I think typically not. Speaking of which: I would not call them “logits” for this reason but “score” or so because the former always has a “unnormalized log probabilities” ring to me, which is not the case here.

Best regards

Thomas

Thanks, I tried it and works. I also had another question: compared with the CrossEntropy, what is the advantages and disadvantages of using the Multi-Margin loss when doing a classification task?

Could you share your observation on the performance when using MultiMarginLoss vs. CrossEntropy? Does it perform better / create better embeddings?

My basic take on this is:

  • Using softmax + negative log likelihood (which combines to CrossEntropyLoss in PyTorch lingo) has the charming property of having NLL being interpretable thanks to the “likelihood” (and the entire training to be a maximum likelihood fitting of sorts). It has proven to work over and over again, typically with number of classes <1000. It seems to not work as well for problems with many more classes.
  • Using margin losses (which to me always has a SVM association has the advantage of very directly dealing with the negative examples and I look at it as being less prone to saturation effects than softmax in this when you have many classes. Edit: Also, the margin doesn’t have this push to extreme scores (“overconfidence” if you use the outputs of softmax as probabilities).

But you should take this with a lot of caution (“random person on the internet says”) and do your own experimentation for the problem you want to solve. :slight_smile:

Best regards

Thomas

1 Like