Usage of cross entropy loss

Is cross entropy loss good for multi-label classification or for binary-class classification?

Please also tell how to use it?

criterion = nn.CrossEntropyLoss().cuda()
input = torch.autograd.Variable(torch.randn((3,5)))
tgt = torch.autograd.Variable(torch.randn((3,5)))
loss = criterion(input,tgt)

Tried above, but got error
TypeError: FloatClassNLLCriterion_updateOutput received an invalid combination of arguments - got (int, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, bool, NoneType, torch.FloatTensor), but expected (int state, torch.FloatTensor input, torch.LongTensor target, torch.FloatTensor output, bool sizeAverage, [torch.FloatTensor weights or None], torch.FloatTensor total_weight)

2 Likes

Have a look at the documentation of CrossEntropyLoss.
It states:

It is useful when training a classification problem with C classes.

The error message gives you a hint, that some types are wrong.
You should pass the target as a LongTensor.
Try changing the tgt to:

tgt = torch.autograd.Variable(torch.LongTensor(3).random_(5))
3 Likes

How then it is showing multi-class classification in this case as 3 can be assumed as number of examples and 5 can be number of classes.

Each input row can be interpreted as probability to map to that corresponding class?

What if I have multiple classes? How to write that as vector?

Iā€™m confused a bit. Do you mean multiclass classification or multi-label classification?
CrossEntropyLoss is used for multiclass classification, i.e. predict one of several classes for each example.
For multi-label classification, there are some losses like MultiLabelMarginLoss.

4 Likes

Sorry, I meant multi-label classification.

Can you tell how can I define the accuracy function for above problem?
My label vector has ones at the classes which are there in the feature.

Sorry, I havenā€™t used MultiLabelMarginLoss yet and would have to get familiar with it, before posting a wrong approach.
However, for multi-label classification, you could use a sigmoid in your last layer and feed it to BCELoss:

x = Variable(torch.randn(10, 3))
output = F.sigmoid(x)
target = Variable(torch.Tensor(10, 3).random_(2))

criterion = nn.BCELoss(reduce=False)
loss = criterion(output, target)
print(loss)

Hope this snippet is helpful.

3 Likes

reduce = False causing error?

TypeError: init() got an unexpected keyword argument ā€˜reduceā€™

What can be the cause and its need?

I just used it, so that you can see the loss of each sample instead of the mean or sum. You can safely skip this argument.
However, which Pytorch version are you using? I would suggest to update it, since the newer versions have some nice features and bug fixes. :wink:

0.1.12_1

Thanks a lot. @ptrblck :smiley:

Oh yeah, you should definitely update :wink:
You can find the install instructions on Pytorch.org.

1 Like

Can you suggest how can I write the accuracy function for multilabel classification?

You could use the hamming loss or ā€œhamming scoreā€:


target = torch.FloatTensor([[0, 1, 0],
                           [1, 1, 1],
                           [0 ,0 ,0]])

pred = torch.FloatTensor([[0, 1, 1],
                          [1, 1, 1],
                          [0 ,1 ,0]])

hamming_score = 1 - (target != pred).sum() / float(target.nelement())

Scikit provides other metrics like Jaccard similarity coefficient.

Would this work for you?

2 Likes

How shall I saturate my outputs? Basically my outputs are some figures which are like prob of lying in that label if I do a softmax.

But after softmax how shall I do thresholding to assign them to 0 or 1?

Please feel free to question back if I am not clear.

I would threshold the output to get the predictions. There might be some ā€œaccuracyā€ metrics for probabilities which Iā€™m not aware of, though.
I suppose you are using sigmoid instead of softmax. :wink:

In sigmoid, can we use 0.5 as the thresholding?

Sure, you could also tune it to favor some classes, if thatā€™s important in your use case.

Do you know what is normally done in such cases?

Go for 0.5 and see if your score is good enough.
If you have an imbalanced dataset, I also compute the confusion matrix and sometimes the Cohens Kappa.

1 Like

So I have an image of size 9x50 as:

tensor([[ 0.4115, -0.6465, -1.6343, 0.6694, -0.8929, 0.7482, -0.6784, -1.2556,
-0.9919, 0.7736, -1.3033, -1.4822, 1.6883, 1.3857, -0.4635, -0.4117,
0.1361, 1.2751, 1.5286, -1.0493, 0.4839, -2.1620, -1.4373, -0.3013,
0.5121, 0.7913, 0.7924, -0.7720, -0.3467, 1.1353, 0.5904, -1.8757,
0.5789, -2.0829, 1.2716, -0.2533, -0.6339, 0.5726, -0.1584, 1.2937,
-0.6060, -0.7181, -1.1443, 0.1927, 0.0326, -1.3743, -0.5325, 0.7743,
-1.0776, 0.5832],
[-0.4022, -0.0806, 0.6202, 1.4176, -0.0325, 0.2146, 0.4789, 0.2615,
-1.9354, -0.9925, -1.3699, 1.4623, 1.1422, 0.4273, 0.7865, 0.4704,
0.7516, -0.8715, -0.7594, -0.3551, 0.6217, 1.5333, -1.7359, 0.7198,
-0.4480, 0.4198, 0.5431, 0.2605, -0.5880, -0.3684, 0.5031, -1.3644,
0.3791, 0.4395, -0.0098, -0.3250, -1.9895, 0.5293, 0.5274, 1.5332,
1.0197, -1.1839, 0.2819, 1.7081, 0.1653, 0.3076, -1.0679, -0.5644,
2.5712, -0.6777],
[ 0.3608, 0.7212, -1.5474, 1.0859, -0.5586, 1.3594, -1.2196, -1.5036,
0.8116, 0.6708, 0.9988, -0.7967, -0.7120, 0.5176, -1.9599, 0.2420,
0.0513, -1.1133, 0.6954, -0.4826, -1.5786, 0.1810, 0.7230, 0.4276,
-0.2598, 0.4369, -0.3106, -0.0446, 1.1185, -0.7355, 0.0219, -0.0619,
0.0329, 0.1079, 0.2461, 0.7204, 1.0873, -1.1423, 0.0986, -0.6493,
1.1245, -0.8159, 1.3520, -0.8926, 0.4020, 1.0555, -1.1234, -0.0147,
1.3508, 0.6182],
[-0.7430, 0.5251, -0.6153, -0.0003, -0.6046, 1.1388, -0.7799, -1.9012,
-0.4144, -0.0861, 0.0823, 0.6609, 1.0585, -0.5026, -0.1830, 0.8965,
0.1796, 0.7578, 0.2869, -1.3962, -1.7420, 1.7718, 1.6606, 0.5634,
-0.1225, 0.5426, -2.1004, 0.0133, 0.7839, 1.8201, -0.0306, 0.2149,
-0.2372, -0.3642, 0.3713, 0.1301, -0.2877, -0.4470, -0.1347, -1.3249,
0.6950, 0.0947, -0.0682, -0.3107, -0.5063, -0.1554, 0.5312, 0.2986,
-0.7677, 0.9213],
[ 1.1000, -0.6128, -0.6937, -0.9583, 0.2561, -0.0408, 0.5273, -0.1111,
-0.3420, 0.9789, -0.5763, -0.3564, -1.1349, 0.2419, 1.0597, 1.3880,
0.3580, -1.2515, -0.1734, 0.2403, -0.2600, 0.4373, -0.5632, 0.5021,
1.9840, -0.5519, -1.5868, 1.2105, 1.0267, 1.4813, -1.5021, 1.6625,
-0.9624, 1.4024, 2.0388, 0.0238, 0.3076, 1.6528, -0.4595, -0.7159,
-0.8997, -1.8804, -1.1647, 1.8108, -1.4731, -1.1084, 0.5496, 1.5376,
0.1698, -0.4175],
[-0.7766, -2.0425, -0.8977, 0.0425, 1.8165, -0.6411, 0.1768, -0.7219,
-0.4880, -0.4142, 0.7928, -0.5951, 1.1639, -0.0928, -0.3169, 1.5937,
-1.1871, 0.2590, -0.1274, 0.1017, -1.0488, 0.1753, 0.5793, 0.1125,
-0.4837, -1.4312, 0.0187, -0.6604, -0.3871, 1.6479, -1.4328, 0.9142,
0.0699, 0.8660, 1.0728, -0.8291, 1.0222, 0.1272, -0.5531, 0.8532,
0.5304, 0.4040, -0.7247, 0.1954, -0.2499, 0.9694, 0.8410, -0.1247,
-1.5646, -1.3319],
[-0.2229, -1.6662, 0.5105, -0.2770, 0.3966, 1.0326, 0.9928, -0.4494,
0.6234, 0.4386, -0.6726, 1.1923, 1.1223, -0.5312, 0.2890, -0.8353,
-1.3872, -0.2604, 1.7785, 0.2281, 0.8691, 0.8132, -0.0213, -1.0649,
0.3980, -0.2038, 1.5023, 0.3054, 0.8736, 1.8556, -1.3965, 1.0579,
-0.0868, -0.3515, -1.2344, -0.2689, 1.1425, 0.1928, 1.0721, -1.5331,
-0.2131, 1.0340, 1.6211, 0.2218, 1.7555, 0.3581, 2.6108, -0.1747,
0.1864, 0.0211],
[-2.1773, 0.4278, 0.2847, 0.4405, 0.9457, -0.1819, -0.3713, 1.0402,
-0.9497, -0.0645, 0.1729, -0.6848, 0.2156, -0.0078, 0.3848, -0.4249,
1.2975, -0.4167, 0.0660, 1.6326, -0.4543, 0.7339, 0.6010, 0.8946,
1.2881, -1.0936, 1.1421, -0.5225, 0.1843, -1.0033, 0.1155, -0.4692,
1.5356, 0.1045, -1.0899, 2.0136, 1.7887, 2.1656, -1.2265, -0.0519,
0.0472, 0.2626, -0.5554, 1.6628, -1.0357, 0.4898, 1.1277, 0.0699,
0.4967, 0.8722],
[ 0.7352, -0.6486, -0.6952, 2.6622, 0.2339, -0.0961, 1.8036, -0.3650,
-1.2539, -0.0111, 0.6007, -0.3418, -0.9551, 0.3020, 1.3864, -0.0676,
0.8362, 0.1694, -0.5506, 0.4202, 0.2058, -0.9739, 1.5484, -1.0143,
-0.7052, -0.2831, 0.7834, -3.0195, 0.3679, 0.9377, 0.0174, -1.4630,
-0.4082, -0.5332, 0.8701, -0.5404, 1.8485, -1.9600, -0.1757, -0.1020,
-0.1524, -1.8317, -0.0961, 1.1949, -1.2083, 1.7236, -0.2691, 0.9958,
0.6578, -0.0425]], grad_fn=)

and it has labels of size 9x1 like so,

[3, 3, 7, 0, 8, 3, 3, 3, 1],

between 0-8 (no one hot encoding)

which means each row of 50 dimensions in this image has a label. To summarize,
image

9x50 ā†’ 9x1

and, we have a batch size of 128.

So, 128x9x50 maps to 128x9.

Since, each image has 9 labels, can we apply 2d crossentropy? Am not sure how to use that though.
OR , do you suggest some other loss function? What about the accuracy calculation? I guess I may not be able to use hamming distance since am not using one-hot-encoded label vectors. my labels are class indexes

Regards

I tried to use crossentropy loss for video generation but it does not work.
The input dimension is the same as target dimension but crossentropy loss expects the target to be of 1-lower dimension. How should I fix that?