MultiLabelSoftMarginLoss vs BCEWithLogitsLoss vs KLDivLoss

Mehran_Ziadloo · December 5, 2019, 7:49pm

Sorry if my question is too basic. I’m comparing the three loss functions of BCEWithLogitsLoss, MultiLabelSoftMarginLoss, and KLDivLoss. Here are the formula for each one:

For BCEWithLogitsLoss, we have (assuming the reduction is mean):

$l(x, y) = \frac{-1}{C} \sum_{i=1}^{C} y_i * log(\sigma (x_i)) + (1-y_i) * log(1-\sigma (x_i))$

And for MultiLabelSoftMarginLoss, we have:

$l(x, y) = \frac{-1}{C} \sum_{i=1}^{C} y_i * log(\sigma (x_i)) + (1-y_i) * log(\sigma (-x_i))$

And finally, for KLDivLoss we have (with batchmean reduction):

$l(x, y) = \frac{-1}{N} \sum_{i=1}^{N} y_i * (log(y_i) - x_i)$

While I understand the first formula (it’s cross-entropy) and its use case, I’m not familiar with the next two. Could someone please explain to me when should I use any of these three loss functions? I’ve been reading the implementation for The Annotated Transformer where in my opinion was a good candidate for BCEWithLogitsLoss but the author had decided to use KLDivLoss. In general, what are the use cases for each of these three loss functions?