Usage of cross entropy loss

Happy_NewYear · March 13, 2018, 7:26am

Is cross entropy loss good for multi-label classification or for binary-class classification?

Please also tell how to use it?

criterion = nn.CrossEntropyLoss().cuda()
input = torch.autograd.Variable(torch.randn((3,5)))
tgt = torch.autograd.Variable(torch.randn((3,5)))
loss = criterion(input,tgt)

Tried above, but got error
TypeError: FloatClassNLLCriterion_updateOutput received an invalid combination of arguments - got (int, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, bool, NoneType, torch.FloatTensor), but expected (int state, torch.FloatTensor input, torch.LongTensor target, torch.FloatTensor output, bool sizeAverage, [torch.FloatTensor weights or None], torch.FloatTensor total_weight)

ptrblck · March 13, 2018, 9:04am

Have a look at the documentation of CrossEntropyLoss.
It states:

It is useful when training a classification problem with C classes.

The error message gives you a hint, that some types are wrong.
You should pass the target as a LongTensor.
Try changing the tgt to:

tgt = torch.autograd.Variable(torch.LongTensor(3).random_(5))

Happy_NewYear · March 13, 2018, 9:16am

How then it is showing multi-class classification in this case as 3 can be assumed as number of examples and 5 can be number of classes.

Each input row can be interpreted as probability to map to that corresponding class?

What if I have multiple classes? How to write that as vector?

ptrblck · March 13, 2018, 9:20am

I’m confused a bit. Do you mean multiclass classification or multi-label classification?
CrossEntropyLoss is used for multiclass classification, i.e. predict one of several classes for each example.
For multi-label classification, there are some losses like MultiLabelMarginLoss.

Happy_NewYear · March 14, 2018, 12:22pm

Sorry, I meant multi-label classification.

Can you tell how can I define the accuracy function for above problem?
My label vector has ones at the classes which are there in the feature.

ptrblck · March 14, 2018, 1:05pm

Sorry, I haven’t used MultiLabelMarginLoss yet and would have to get familiar with it, before posting a wrong approach.
However, for multi-label classification, you could use a sigmoid in your last layer and feed it to BCELoss:

x = Variable(torch.randn(10, 3))
output = F.sigmoid(x)
target = Variable(torch.Tensor(10, 3).random_(2))

criterion = nn.BCELoss(reduce=False)
loss = criterion(output, target)
print(loss)

Hope this snippet is helpful.

Happy_NewYear · March 14, 2018, 4:58pm

reduce = False causing error?

TypeError: init() got an unexpected keyword argument ‘reduce’

What can be the cause and its need?

ptrblck · March 14, 2018, 5:26pm

I just used it, so that you can see the loss of each sample instead of the mean or sum. You can safely skip this argument.
However, which Pytorch version are you using? I would suggest to update it, since the newer versions have some nice features and bug fixes.

Happy_NewYear · March 14, 2018, 5:51pm

0.1.12_1

Thanks a lot. @ptrblck

ptrblck · March 14, 2018, 8:55pm

Oh yeah, you should definitely update
You can find the install instructions on Pytorch.org.

Happy_NewYear · March 19, 2018, 11:05am

Can you suggest how can I write the accuracy function for multilabel classification?

ptrblck · March 19, 2018, 12:44pm

You could use the hamming loss or “hamming score”:


target = torch.FloatTensor([[0, 1, 0],
                           [1, 1, 1],
                           [0 ,0 ,0]])

pred = torch.FloatTensor([[0, 1, 1],
                          [1, 1, 1],
                          [0 ,1 ,0]])

hamming_score = 1 - (target != pred).sum() / float(target.nelement())

Scikit provides other metrics like Jaccard similarity coefficient.

Would this work for you?

Happy_NewYear · March 19, 2018, 2:02pm

How shall I saturate my outputs? Basically my outputs are some figures which are like prob of lying in that label if I do a softmax.

But after softmax how shall I do thresholding to assign them to 0 or 1?

Please feel free to question back if I am not clear.

ptrblck · March 19, 2018, 2:05pm

I would threshold the output to get the predictions. There might be some “accuracy” metrics for probabilities which I’m not aware of, though.
I suppose you are using sigmoid instead of softmax.

Happy_NewYear · March 19, 2018, 2:09pm

In sigmoid, can we use 0.5 as the thresholding?

ptrblck · March 19, 2018, 2:11pm

Sure, you could also tune it to favor some classes, if that’s important in your use case.

Happy_NewYear · March 19, 2018, 2:12pm

Do you know what is normally done in such cases?

ptrblck · March 19, 2018, 2:15pm

Go for 0.5 and see if your score is good enough.
If you have an imbalanced dataset, I also compute the confusion matrix and sometimes the Cohens Kappa.

surojit_sengupta · November 21, 2018, 7:50am

So I have an image of size 9x50 as:

tensor([[ 0.4115, -0.6465, -1.6343, 0.6694, -0.8929, 0.7482, -0.6784, -1.2556,
-0.9919, 0.7736, -1.3033, -1.4822, 1.6883, 1.3857, -0.4635, -0.4117,
0.1361, 1.2751, 1.5286, -1.0493, 0.4839, -2.1620, -1.4373, -0.3013,
0.5121, 0.7913, 0.7924, -0.7720, -0.3467, 1.1353, 0.5904, -1.8757,
0.5789, -2.0829, 1.2716, -0.2533, -0.6339, 0.5726, -0.1584, 1.2937,
-0.6060, -0.7181, -1.1443, 0.1927, 0.0326, -1.3743, -0.5325, 0.7743,
-1.0776, 0.5832],
[-0.4022, -0.0806, 0.6202, 1.4176, -0.0325, 0.2146, 0.4789, 0.2615,
-1.9354, -0.9925, -1.3699, 1.4623, 1.1422, 0.4273, 0.7865, 0.4704,
0.7516, -0.8715, -0.7594, -0.3551, 0.6217, 1.5333, -1.7359, 0.7198,
-0.4480, 0.4198, 0.5431, 0.2605, -0.5880, -0.3684, 0.5031, -1.3644,
0.3791, 0.4395, -0.0098, -0.3250, -1.9895, 0.5293, 0.5274, 1.5332,
1.0197, -1.1839, 0.2819, 1.7081, 0.1653, 0.3076, -1.0679, -0.5644,
2.5712, -0.6777],
[ 0.3608, 0.7212, -1.5474, 1.0859, -0.5586, 1.3594, -1.2196, -1.5036,
0.8116, 0.6708, 0.9988, -0.7967, -0.7120, 0.5176, -1.9599, 0.2420,
0.0513, -1.1133, 0.6954, -0.4826, -1.5786, 0.1810, 0.7230, 0.4276,
-0.2598, 0.4369, -0.3106, -0.0446, 1.1185, -0.7355, 0.0219, -0.0619,
0.0329, 0.1079, 0.2461, 0.7204, 1.0873, -1.1423, 0.0986, -0.6493,
1.1245, -0.8159, 1.3520, -0.8926, 0.4020, 1.0555, -1.1234, -0.0147,
1.3508, 0.6182],
[-0.7430, 0.5251, -0.6153, -0.0003, -0.6046, 1.1388, -0.7799, -1.9012,
-0.4144, -0.0861, 0.0823, 0.6609, 1.0585, -0.5026, -0.1830, 0.8965,
0.1796, 0.7578, 0.2869, -1.3962, -1.7420, 1.7718, 1.6606, 0.5634,
-0.1225, 0.5426, -2.1004, 0.0133, 0.7839, 1.8201, -0.0306, 0.2149,
-0.2372, -0.3642, 0.3713, 0.1301, -0.2877, -0.4470, -0.1347, -1.3249,
0.6950, 0.0947, -0.0682, -0.3107, -0.5063, -0.1554, 0.5312, 0.2986,
-0.7677, 0.9213],
[ 1.1000, -0.6128, -0.6937, -0.9583, 0.2561, -0.0408, 0.5273, -0.1111,
-0.3420, 0.9789, -0.5763, -0.3564, -1.1349, 0.2419, 1.0597, 1.3880,
0.3580, -1.2515, -0.1734, 0.2403, -0.2600, 0.4373, -0.5632, 0.5021,
1.9840, -0.5519, -1.5868, 1.2105, 1.0267, 1.4813, -1.5021, 1.6625,
-0.9624, 1.4024, 2.0388, 0.0238, 0.3076, 1.6528, -0.4595, -0.7159,
-0.8997, -1.8804, -1.1647, 1.8108, -1.4731, -1.1084, 0.5496, 1.5376,
0.1698, -0.4175],
[-0.7766, -2.0425, -0.8977, 0.0425, 1.8165, -0.6411, 0.1768, -0.7219,
-0.4880, -0.4142, 0.7928, -0.5951, 1.1639, -0.0928, -0.3169, 1.5937,
-1.1871, 0.2590, -0.1274, 0.1017, -1.0488, 0.1753, 0.5793, 0.1125,
-0.4837, -1.4312, 0.0187, -0.6604, -0.3871, 1.6479, -1.4328, 0.9142,
0.0699, 0.8660, 1.0728, -0.8291, 1.0222, 0.1272, -0.5531, 0.8532,
0.5304, 0.4040, -0.7247, 0.1954, -0.2499, 0.9694, 0.8410, -0.1247,
-1.5646, -1.3319],
[-0.2229, -1.6662, 0.5105, -0.2770, 0.3966, 1.0326, 0.9928, -0.4494,
0.6234, 0.4386, -0.6726, 1.1923, 1.1223, -0.5312, 0.2890, -0.8353,
-1.3872, -0.2604, 1.7785, 0.2281, 0.8691, 0.8132, -0.0213, -1.0649,
0.3980, -0.2038, 1.5023, 0.3054, 0.8736, 1.8556, -1.3965, 1.0579,
-0.0868, -0.3515, -1.2344, -0.2689, 1.1425, 0.1928, 1.0721, -1.5331,
-0.2131, 1.0340, 1.6211, 0.2218, 1.7555, 0.3581, 2.6108, -0.1747,
0.1864, 0.0211],
[-2.1773, 0.4278, 0.2847, 0.4405, 0.9457, -0.1819, -0.3713, 1.0402,
-0.9497, -0.0645, 0.1729, -0.6848, 0.2156, -0.0078, 0.3848, -0.4249,
1.2975, -0.4167, 0.0660, 1.6326, -0.4543, 0.7339, 0.6010, 0.8946,
1.2881, -1.0936, 1.1421, -0.5225, 0.1843, -1.0033, 0.1155, -0.4692,
1.5356, 0.1045, -1.0899, 2.0136, 1.7887, 2.1656, -1.2265, -0.0519,
0.0472, 0.2626, -0.5554, 1.6628, -1.0357, 0.4898, 1.1277, 0.0699,
0.4967, 0.8722],
[ 0.7352, -0.6486, -0.6952, 2.6622, 0.2339, -0.0961, 1.8036, -0.3650,
-1.2539, -0.0111, 0.6007, -0.3418, -0.9551, 0.3020, 1.3864, -0.0676,
0.8362, 0.1694, -0.5506, 0.4202, 0.2058, -0.9739, 1.5484, -1.0143,
-0.7052, -0.2831, 0.7834, -3.0195, 0.3679, 0.9377, 0.0174, -1.4630,
-0.4082, -0.5332, 0.8701, -0.5404, 1.8485, -1.9600, -0.1757, -0.1020,
-0.1524, -1.8317, -0.0961, 1.1949, -1.2083, 1.7236, -0.2691, 0.9958,
0.6578, -0.0425]], grad_fn=)

and it has labels of size 9x1 like so,

[3, 3, 7, 0, 8, 3, 3, 3, 1],

between 0-8 (no one hot encoding)

which means each row of 50 dimensions in this image has a label. To summarize,
image

9x50 → 9x1

and, we have a batch size of 128.

So, 128x9x50 maps to 128x9.

Since, each image has 9 labels, can we apply 2d crossentropy? Am not sure how to use that though.
OR , do you suggest some other loss function? What about the accuracy calculation? I guess I may not be able to use hamming distance since am not using one-hot-encoded label vectors. my labels are class indexes

Regards

Melika-Ayoughi · June 27, 2019, 3:58pm

I tried to use crossentropy loss for video generation but it does not work.
The input dimension is the same as target dimension but crossentropy loss expects the target to be of 1-lower dimension. How should I fix that?