Usage of cross entropy loss

(Happy) #1

Is cross entropy loss good for multi-label classification or for binary-class classification?

Please also tell how to use it?

criterion = nn.CrossEntropyLoss().cuda()
input = torch.autograd.Variable(torch.randn((3,5)))
tgt = torch.autograd.Variable(torch.randn((3,5)))
loss = criterion(input,tgt)

Tried above, but got error
TypeError: FloatClassNLLCriterion_updateOutput received an invalid combination of arguments - got (int, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, bool, NoneType, torch.FloatTensor), but expected (int state, torch.FloatTensor input, torch.LongTensor target, torch.FloatTensor output, bool sizeAverage, [torch.FloatTensor weights or None], torch.FloatTensor total_weight)


Have a look at the documentation of CrossEntropyLoss.
It states:

It is useful when training a classification problem with C classes.

The error message gives you a hint, that some types are wrong.
You should pass the target as a LongTensor.
Try changing the tgt to:

tgt = torch.autograd.Variable(torch.LongTensor(3).random_(5))

(Happy) #3

How then it is showing multi-class classification in this case as 3 can be assumed as number of examples and 5 can be number of classes.

Each input row can be interpreted as probability to map to that corresponding class?

What if I have multiple classes? How to write that as vector?


I’m confused a bit. Do you mean multiclass classification or multi-label classification?
CrossEntropyLoss is used for multiclass classification, i.e. predict one of several classes for each example.
For multi-label classification, there are some losses like MultiLabelMarginLoss.

(Happy) #5

Sorry, I meant multi-label classification.

Can you tell how can I define the accuracy function for above problem?
My label vector has ones at the classes which are there in the feature.


Sorry, I haven’t used MultiLabelMarginLoss yet and would have to get familiar with it, before posting a wrong approach.
However, for multi-label classification, you could use a sigmoid in your last layer and feed it to BCELoss:

x = Variable(torch.randn(10, 3))
output = F.sigmoid(x)
target = Variable(torch.Tensor(10, 3).random_(2))

criterion = nn.BCELoss(reduce=False)
loss = criterion(output, target)

Hope this snippet is helpful.

(Happy) #7

reduce = False causing error?

TypeError: init() got an unexpected keyword argument ‘reduce’

What can be the cause and its need?


I just used it, so that you can see the loss of each sample instead of the mean or sum. You can safely skip this argument.
However, which Pytorch version are you using? I would suggest to update it, since the newer versions have some nice features and bug fixes. :wink:

(Happy) #9


Thanks a lot. @ptrblck :smiley:


Oh yeah, you should definitely update :wink:
You can find the install instructions on

(Happy) #11

Can you suggest how can I write the accuracy function for multilabel classification?


You could use the hamming loss or “hamming score”:

target = torch.FloatTensor([[0, 1, 0],
                           [1, 1, 1],
                           [0 ,0 ,0]])

pred = torch.FloatTensor([[0, 1, 1],
                          [1, 1, 1],
                          [0 ,1 ,0]])

hamming_score = 1 - (target != pred).sum() / float(target.nelement())

Scikit provides other metrics like Jaccard similarity coefficient.

Would this work for you?

(Happy) #13

How shall I saturate my outputs? Basically my outputs are some figures which are like prob of lying in that label if I do a softmax.

But after softmax how shall I do thresholding to assign them to 0 or 1?

Please feel free to question back if I am not clear.


I would threshold the output to get the predictions. There might be some “accuracy” metrics for probabilities which I’m not aware of, though.
I suppose you are using sigmoid instead of softmax. :wink:

(Happy) #15

In sigmoid, can we use 0.5 as the thresholding?


Sure, you could also tune it to favor some classes, if that’s important in your use case.

(Happy) #17

Do you know what is normally done in such cases?


Go for 0.5 and see if your score is good enough.
If you have an imbalanced dataset, I also compute the confusion matrix and sometimes the Cohens Kappa.