Why is this multi-class logistic loss named multi-class?

On page 3 of this paper: https://arxiv.org/pdf/1511.02251.pdf there is a multi-class logistic loss:

With the accompanying text:
The multi-class logistic loss minimizes the negative sum of the log-probabilities over all positive labels. Herein, the probabilities are computed using a softmax layer.

We know that multi-class means “a sample can be one of N classes where N>2”.

The loss takes a sum of positive labels (y_nk is 0 if the target is negative and 1 if the target is true), so why is this loss called a multi-class loss? The way I see it, this should be called a multi-label loss, since it will learn from multiple true labels (if an input has multiple true labels)?

First of all, I don’t think this is the right forum to ask such type of questions, since they are not specifically related to PyTorch.

Anyway, I think they call it multi-class loss to distinguish it from the binary classification loss, which is usually written as something like:

CodeCogsEqn (1)

where the y_i’s are the binary labels (either 0 or 1) and the p_i’s are the outputs of your binary classifier (usually, the output of a sigmoid). It is easy to see that the multi-class loss may be reduced to this loss when K=2, by doing a simple reparameterization that transforms a binary softmax into a sigmoid.

1 Like