Please explain the code in the following Loss Class

Hello, I am a beginner in Deep Learning and PyTorch. If my question is not relevant or does not follow the community guidelines, please pardon me.

I am studying the following Kaggle kernel and trying to replicate it:

In the section titled, “PreTrainedModels”, the author uses a resnet34 model and writes the following loss function class:

class DenseCrossEntropy(nn.Module):

    def __init__(self):
        super(DenseCrossEntropy, self).__init__()
    def forward(self, logits, labels):
        logits = logits.float()
        labels = labels.float()
        logprobs = F.log_softmax(logits, dim=-1)
        loss = -labels * logprobs
        loss = loss.sum(-1)

        return loss.mean()

I’ve read about Cross Entropy Loss function and I think I get the basic gist of it. But I do not understand what’s going on in this class. Can someone please explain what’s going on?

I can see its calculating the log softmax but then why the multiplication with the labels? And why loss.sum(-1)?

This is a Kaggle competition where given a plant image, we have to predict 4 classes (healthy or 3 types of diseases).

Thank you so much for your help!

Hi Subhankar!

Pytorch’s CrossEntropyLoss takes as its target (labels) a single
integer class label for each sample in the batch (i.e., a tensor of shape
[nBatch]). That is, target == 2 means that class “2” is the right answer
with 100% certainty.

The notion of cross-entropy is more general, however, in that it compares
two probability distributions.

So you might wish to work with probabilistic targets. e.g. for nBatch = 1:

target = torch.FloatTensor ([[0.1, 0.2, 0.7]])

This means that this sample is in class “0” with probability 10%, in
class “1” with probability 20%, and in class “2” with probability 70%.

This more general version of cross-entropy (which Kaggle is calling
DenseCrossEntropy) is not supported directly in pytorch, hence the
need to implement it explicitly. (But it’s not very hard, and you can
do it using pytorch tensor operations, so you get the full benefits of
autograd and gpu support.)

The “multiplication with the labels” is multiplying the vector of
probabilistic labels (element-wise) with the vector of logprobs
obtained from the vector of logits (raw-score predictions). The
sum is just summing these terms in the formula for cross-entropy
across classes (within a single sample, hence .sum (dim = -1)).

To recap one key point: for DenseCrossEntropy, labels has
shape [nBatch, nClass], while for torch.nn.CrossEntropyLoss,
labels (target) has shape [nBatch].

You can see the formula for this general cross-entropy in Wikipedia’s
Cross entropy entry.

Compare this to the formula for pytorch’s CrossEntropyLoss that uses
categorical class labels for its target.

(As an aside, you can use one-hot-encoded class labels as a special
case of probabilistic labels with DenseCrossEntropy. These are
just labels that consist of all 0s (0% probability) for all of the classes
except for one 1 (100% probability) for the class that is being labelled
as “correct.”)


K. Frank

1 Like

Thank you so much!! I understood this now.