When and how shall we use the "torch.nn.MultiMarginLoss"?"

crissallan · December 2, 2021, 4:48am

Can I use the torch.nn. MultiMarginLoss instead of using the cross-entropy loss for a classification task?
Should I apply softmax function to the logits before I send the logits into the “MultiMarginLoss” to make the logits as normalized probobility?

tom · December 2, 2021, 7:04am

Yes.

I think typically not. Speaking of which: I would not call them “logits” for this reason but “score” or so because the former always has a “unnormalized log probabilities” ring to me, which is not the case here.

Best regards

Thomas

crissallan · December 31, 2021, 3:13am

Thanks, I tried it and works. I also had another question: compared with the CrossEntropy, what is the advantages and disadvantages of using the Multi-Margin loss when doing a classification task?

InnovArul · December 31, 2021, 10:01am

Could you share your observation on the performance when using MultiMarginLoss vs. CrossEntropy? Does it perform better / create better embeddings?

tom · January 4, 2022, 9:42pm

My basic take on this is:

Using softmax + negative log likelihood (which combines to CrossEntropyLoss in PyTorch lingo) has the charming property of having NLL being interpretable thanks to the “likelihood” (and the entire training to be a maximum likelihood fitting of sorts). It has proven to work over and over again, typically with number of classes <1000. It seems to not work as well for problems with many more classes.
Using margin losses (which to me always has a SVM association has the advantage of very directly dealing with the negative examples and I look at it as being less prone to saturation effects than softmax in this when you have many classes. Edit: Also, the margin doesn’t have this push to extreme scores (“overconfidence” if you use the outputs of softmax as probabilities).

But you should take this with a lot of caution (“random person on the internet says”) and do your own experimentation for the problem you want to solve.

Best regards

Thomas

crissallan · January 30, 2022, 5:56am

I would say it works in my task.But it had a similar performance compared with the CrossEntropy

InnovArul · January 30, 2022, 6:04am

oh ok. Thanks for sharing the observation in your task.