Unable to understand loss criterion

pvskand · June 8, 2017, 10:01am

In the cifar10.py tutorial, I have seen that to calculate the loss you pass the output of the network along with the labels
loss = criterion(outputs, labels)
but when I print their sizes, I find they are different
print outputs.size() >> (4L, 10L)
print lables.size() >> (4L)

In that case how is the loss being calculated?

tjppires · June 8, 2017, 10:21am

For the loss you only care about the probability of the correct label. In this case, you have a minibatch of size 4 and there are 10 possible categories to choose from (hence the (4L, 10L)).

If you recall the cross-entropy loss, it is: log(probability(correct_label)), summed over the minibatch. So the labels are just 4 integers (one per sample in the minibatch) between 0 and 9 (since you have 10 labels).

pvskand · June 8, 2017, 10:40am

I came across another error while calculating loss with cross entropy .

There error said the following:

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4 at /b/wheel/pytorch-src/torch/lib/THCUNN/generic/SpatialClassNLLCriterion.cu:14
I wasn’t able to understand what the error meant?
I have a batch size of 5 and and the image size is 256 x 256 x 1 so the output size is (5L, 1L, 256L, 256L) and so is the size of the label.

smth · June 22, 2017, 3:30am

you misunderstand (or haven’t read) the documentation for CrossEntropy Loss. Particularly, the shape and meaning of targets.
http://pytorch.org/docs/nn.html#crossentropyloss

surojit_sengupta · November 21, 2018, 8:05am

So I have an image of size 9x50 as:

tensor([[ 0.4115, -0.6465, -1.6343, 0.6694, -0.8929, 0.7482, -0.6784, -1.2556,
-0.9919, 0.7736, -1.3033, -1.4822, 1.6883, 1.3857, -0.4635, -0.4117,
0.1361, 1.2751, 1.5286, -1.0493, 0.4839, -2.1620, -1.4373, -0.3013,
0.5121, 0.7913, 0.7924, -0.7720, -0.3467, 1.1353, 0.5904, -1.8757,
0.5789, -2.0829, 1.2716, -0.2533, -0.6339, 0.5726, -0.1584, 1.2937,
-0.6060, -0.7181, -1.1443, 0.1927, 0.0326, -1.3743, -0.5325, 0.7743,
-1.0776, 0.5832],
[-0.4022, -0.0806, 0.6202, 1.4176, -0.0325, 0.2146, 0.4789, 0.2615,
-1.9354, -0.9925, -1.3699, 1.4623, 1.1422, 0.4273, 0.7865, 0.4704,
0.7516, -0.8715, -0.7594, -0.3551, 0.6217, 1.5333, -1.7359, 0.7198,
-0.4480, 0.4198, 0.5431, 0.2605, -0.5880, -0.3684, 0.5031, -1.3644,
0.3791, 0.4395, -0.0098, -0.3250, -1.9895, 0.5293, 0.5274, 1.5332,
1.0197, -1.1839, 0.2819, 1.7081, 0.1653, 0.3076, -1.0679, -0.5644,
2.5712, -0.6777],
[ 0.3608, 0.7212, -1.5474, 1.0859, -0.5586, 1.3594, -1.2196, -1.5036,
0.8116, 0.6708, 0.9988, -0.7967, -0.7120, 0.5176, -1.9599, 0.2420,
0.0513, -1.1133, 0.6954, -0.4826, -1.5786, 0.1810, 0.7230, 0.4276,
-0.2598, 0.4369, -0.3106, -0.0446, 1.1185, -0.7355, 0.0219, -0.0619,
0.0329, 0.1079, 0.2461, 0.7204, 1.0873, -1.1423, 0.0986, -0.6493,
1.1245, -0.8159, 1.3520, -0.8926, 0.4020, 1.0555, -1.1234, -0.0147,
1.3508, 0.6182],
[-0.7430, 0.5251, -0.6153, -0.0003, -0.6046, 1.1388, -0.7799, -1.9012,
-0.4144, -0.0861, 0.0823, 0.6609, 1.0585, -0.5026, -0.1830, 0.8965,
0.1796, 0.7578, 0.2869, -1.3962, -1.7420, 1.7718, 1.6606, 0.5634,
-0.1225, 0.5426, -2.1004, 0.0133, 0.7839, 1.8201, -0.0306, 0.2149,
-0.2372, -0.3642, 0.3713, 0.1301, -0.2877, -0.4470, -0.1347, -1.3249,
0.6950, 0.0947, -0.0682, -0.3107, -0.5063, -0.1554, 0.5312, 0.2986,
-0.7677, 0.9213],
[ 1.1000, -0.6128, -0.6937, -0.9583, 0.2561, -0.0408, 0.5273, -0.1111,
-0.3420, 0.9789, -0.5763, -0.3564, -1.1349, 0.2419, 1.0597, 1.3880,
0.3580, -1.2515, -0.1734, 0.2403, -0.2600, 0.4373, -0.5632, 0.5021,
1.9840, -0.5519, -1.5868, 1.2105, 1.0267, 1.4813, -1.5021, 1.6625,
-0.9624, 1.4024, 2.0388, 0.0238, 0.3076, 1.6528, -0.4595, -0.7159,
-0.8997, -1.8804, -1.1647, 1.8108, -1.4731, -1.1084, 0.5496, 1.5376,
0.1698, -0.4175],
[-0.7766, -2.0425, -0.8977, 0.0425, 1.8165, -0.6411, 0.1768, -0.7219,
-0.4880, -0.4142, 0.7928, -0.5951, 1.1639, -0.0928, -0.3169, 1.5937,
-1.1871, 0.2590, -0.1274, 0.1017, -1.0488, 0.1753, 0.5793, 0.1125,
-0.4837, -1.4312, 0.0187, -0.6604, -0.3871, 1.6479, -1.4328, 0.9142,
0.0699, 0.8660, 1.0728, -0.8291, 1.0222, 0.1272, -0.5531, 0.8532,
0.5304, 0.4040, -0.7247, 0.1954, -0.2499, 0.9694, 0.8410, -0.1247,
-1.5646, -1.3319],
[-0.2229, -1.6662, 0.5105, -0.2770, 0.3966, 1.0326, 0.9928, -0.4494,
0.6234, 0.4386, -0.6726, 1.1923, 1.1223, -0.5312, 0.2890, -0.8353,
-1.3872, -0.2604, 1.7785, 0.2281, 0.8691, 0.8132, -0.0213, -1.0649,
0.3980, -0.2038, 1.5023, 0.3054, 0.8736, 1.8556, -1.3965, 1.0579,
-0.0868, -0.3515, -1.2344, -0.2689, 1.1425, 0.1928, 1.0721, -1.5331,
-0.2131, 1.0340, 1.6211, 0.2218, 1.7555, 0.3581, 2.6108, -0.1747,
0.1864, 0.0211],
[-2.1773, 0.4278, 0.2847, 0.4405, 0.9457, -0.1819, -0.3713, 1.0402,
-0.9497, -0.0645, 0.1729, -0.6848, 0.2156, -0.0078, 0.3848, -0.4249,
1.2975, -0.4167, 0.0660, 1.6326, -0.4543, 0.7339, 0.6010, 0.8946,
1.2881, -1.0936, 1.1421, -0.5225, 0.1843, -1.0033, 0.1155, -0.4692,
1.5356, 0.1045, -1.0899, 2.0136, 1.7887, 2.1656, -1.2265, -0.0519,
0.0472, 0.2626, -0.5554, 1.6628, -1.0357, 0.4898, 1.1277, 0.0699,
0.4967, 0.8722],
[ 0.7352, -0.6486, -0.6952, 2.6622, 0.2339, -0.0961, 1.8036, -0.3650,
-1.2539, -0.0111, 0.6007, -0.3418, -0.9551, 0.3020, 1.3864, -0.0676,
0.8362, 0.1694, -0.5506, 0.4202, 0.2058, -0.9739, 1.5484, -1.0143,
-0.7052, -0.2831, 0.7834, -3.0195, 0.3679, 0.9377, 0.0174, -1.4630,
-0.4082, -0.5332, 0.8701, -0.5404, 1.8485, -1.9600, -0.1757, -0.1020,
-0.1524, -1.8317, -0.0961, 1.1949, -1.2083, 1.7236, -0.2691, 0.9958,
0.6578, -0.0425]], grad_fn=)

and it has labels of size 9x1 like so,

[3, 3, 7, 0, 8, 3, 3, 3, 1],

between 0-8 (no one hot encoding)

which means each row of 50 dimensions in this image has a label. To summarize,
image

9x50 → 9x1

and, we have a batch size of 128.

So, 128x9x50 maps to 128x9.

Since, each image has 9 labels, can we apply crossentropy? Am not sure how to use that in 2d though.
I want to take a max of probabilities on a single dimension (dim=1) of 128x9x9.

OR , do you suggest some other loss function? My labels are class indexes.

Regards

ptrblck · November 21, 2018, 8:14pm

One possible approach would be to just reshape the rows of your images and the target into the batch dimension, so that your batches will be batch_size*9.
However, your input would be 1-dimensional afterwards.
Would this be a possibility or would you like to process the input somehow as image data?

surojit_sengupta · November 22, 2018, 4:09am

That’s a good take for images but, let me tell you my real motive!
Each image, as we call it here, is actually a sentence and each row is a word.

It’s very important for my convolution layers to view a bunch of words together so that, it doesn’t only train with the word and it’s label but, the word context as well.

ptrblck · November 22, 2018, 7:49am

Thanks for the explanation.
You idea is interesting, as you could process the input as images, i.e. the conv kernels would see some words (rows) and the coding of the words (columns).
As each word has a target, I think we still should reshape the data and target at the end.
Here is a small code example with some comments:

class MyModel(nn.Module):
    def __init__(self):
        super(MyModel, self).__init__()
        self.conv1 = nn.Conv2d(1, 3, 3, 1, 1)  # Use squared kernels
        self.pool1 = nn.MaxPool2d((1, 2))  # Pool only in width (word embeddings)
        self.conv2 = nn.Conv2d(3, 1, 3, 1, 1)
        self.pool2 = nn.MaxPool2d((1, 2))
        
        self.fc1 = nn.Linear(12, 9)
        
    def forward(self, x):
        x = F.relu(self.pool1(self.conv1(x)))
        x = F.relu(self.pool2(self.conv2(x)))
        x = x.permute(0, 2, 1, 3)  # change channels with height
        x = x.view(x.size(0)*x.size(1), -1)
        x = self.fc1(x)
        
        return x

batch_size = 12
x = torch.randn(batch_size, 9, 50)
target = torch.randint(0, 9, (batch_size, 9))  # Assuming your target has 9 values

model = MyModel()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

for epoch in range(100):
    optimizer.zero_grad()
    output = model(x.unsqueeze(1))  # Unsqueeze to add channel dim
    loss = criterion(output, target.view(-1))  # Flatten target
    loss.backward()
    optimizer.step()
    
    print('Epoch {}, loss {}'.format(epoch, loss.item()))

What do you think about this approach? Does it make sense in your use case?

surojit_sengupta · November 22, 2018, 8:14am

I would get back to you by tomorrow. I’ll try these post work hours!
Thanks

surojit_sengupta · November 26, 2018, 3:52am

My apologies. I haven’t yet been able to apply your idea and experiment as promised. Am surrounded a number of issues and I have to deliver a base draft within a deadline