# Unable to understand loss criterion

In the cifar10.py tutorial, I have seen that to calculate the loss you pass the output of the network along with the labels
`loss = criterion(outputs, labels)`
but when I print their sizes, I find they are different
`print outputs.size()` >> `(4L, 10L)`
`print lables.size()` >> `(4L)`

In that case how is the loss being calculated?

2 Likes

For the loss you only care about the probability of the correct label. In this case, you have a minibatch of size 4 and there are 10 possible categories to choose from (hence the (4L, 10L)).

If you recall the cross-entropy loss, it is: log(probability(correct_label)), summed over the minibatch. So the labels are just 4 integers (one per sample in the minibatch) between 0 and 9 (since you have 10 labels).

1 Like

I came across another error while calculating loss with cross entropy .

There error said the following:

`RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4 at /b/wheel/pytorch-src/torch/lib/THCUNN/generic/SpatialClassNLLCriterion.cu:14`
I wasnâ€™t able to understand what the error meant?
I have a batch size of 5 and and the image size is 256 x 256 x 1 so the output size is (5L, 1L, 256L, 256L) and so is the size of the label.

you misunderstand (or havenâ€™t read) the documentation for CrossEntropy Loss. Particularly, the shape and meaning of targets.
http://pytorch.org/docs/nn.html#crossentropyloss

So I have an image of size 9x50 as:

tensor([[ 0.4115, -0.6465, -1.6343, 0.6694, -0.8929, 0.7482, -0.6784, -1.2556,
-0.9919, 0.7736, -1.3033, -1.4822, 1.6883, 1.3857, -0.4635, -0.4117,
0.1361, 1.2751, 1.5286, -1.0493, 0.4839, -2.1620, -1.4373, -0.3013,
0.5121, 0.7913, 0.7924, -0.7720, -0.3467, 1.1353, 0.5904, -1.8757,
0.5789, -2.0829, 1.2716, -0.2533, -0.6339, 0.5726, -0.1584, 1.2937,
-0.6060, -0.7181, -1.1443, 0.1927, 0.0326, -1.3743, -0.5325, 0.7743,
-1.0776, 0.5832],
[-0.4022, -0.0806, 0.6202, 1.4176, -0.0325, 0.2146, 0.4789, 0.2615,
-1.9354, -0.9925, -1.3699, 1.4623, 1.1422, 0.4273, 0.7865, 0.4704,
0.7516, -0.8715, -0.7594, -0.3551, 0.6217, 1.5333, -1.7359, 0.7198,
-0.4480, 0.4198, 0.5431, 0.2605, -0.5880, -0.3684, 0.5031, -1.3644,
0.3791, 0.4395, -0.0098, -0.3250, -1.9895, 0.5293, 0.5274, 1.5332,
1.0197, -1.1839, 0.2819, 1.7081, 0.1653, 0.3076, -1.0679, -0.5644,
2.5712, -0.6777],
[ 0.3608, 0.7212, -1.5474, 1.0859, -0.5586, 1.3594, -1.2196, -1.5036,
0.8116, 0.6708, 0.9988, -0.7967, -0.7120, 0.5176, -1.9599, 0.2420,
0.0513, -1.1133, 0.6954, -0.4826, -1.5786, 0.1810, 0.7230, 0.4276,
-0.2598, 0.4369, -0.3106, -0.0446, 1.1185, -0.7355, 0.0219, -0.0619,
0.0329, 0.1079, 0.2461, 0.7204, 1.0873, -1.1423, 0.0986, -0.6493,
1.1245, -0.8159, 1.3520, -0.8926, 0.4020, 1.0555, -1.1234, -0.0147,
1.3508, 0.6182],
[-0.7430, 0.5251, -0.6153, -0.0003, -0.6046, 1.1388, -0.7799, -1.9012,
-0.4144, -0.0861, 0.0823, 0.6609, 1.0585, -0.5026, -0.1830, 0.8965,
0.1796, 0.7578, 0.2869, -1.3962, -1.7420, 1.7718, 1.6606, 0.5634,
-0.1225, 0.5426, -2.1004, 0.0133, 0.7839, 1.8201, -0.0306, 0.2149,
-0.2372, -0.3642, 0.3713, 0.1301, -0.2877, -0.4470, -0.1347, -1.3249,
0.6950, 0.0947, -0.0682, -0.3107, -0.5063, -0.1554, 0.5312, 0.2986,
-0.7677, 0.9213],
[ 1.1000, -0.6128, -0.6937, -0.9583, 0.2561, -0.0408, 0.5273, -0.1111,
-0.3420, 0.9789, -0.5763, -0.3564, -1.1349, 0.2419, 1.0597, 1.3880,
0.3580, -1.2515, -0.1734, 0.2403, -0.2600, 0.4373, -0.5632, 0.5021,
1.9840, -0.5519, -1.5868, 1.2105, 1.0267, 1.4813, -1.5021, 1.6625,
-0.9624, 1.4024, 2.0388, 0.0238, 0.3076, 1.6528, -0.4595, -0.7159,
-0.8997, -1.8804, -1.1647, 1.8108, -1.4731, -1.1084, 0.5496, 1.5376,
0.1698, -0.4175],
[-0.7766, -2.0425, -0.8977, 0.0425, 1.8165, -0.6411, 0.1768, -0.7219,
-0.4880, -0.4142, 0.7928, -0.5951, 1.1639, -0.0928, -0.3169, 1.5937,
-1.1871, 0.2590, -0.1274, 0.1017, -1.0488, 0.1753, 0.5793, 0.1125,
-0.4837, -1.4312, 0.0187, -0.6604, -0.3871, 1.6479, -1.4328, 0.9142,
0.0699, 0.8660, 1.0728, -0.8291, 1.0222, 0.1272, -0.5531, 0.8532,
0.5304, 0.4040, -0.7247, 0.1954, -0.2499, 0.9694, 0.8410, -0.1247,
-1.5646, -1.3319],
[-0.2229, -1.6662, 0.5105, -0.2770, 0.3966, 1.0326, 0.9928, -0.4494,
0.6234, 0.4386, -0.6726, 1.1923, 1.1223, -0.5312, 0.2890, -0.8353,
-1.3872, -0.2604, 1.7785, 0.2281, 0.8691, 0.8132, -0.0213, -1.0649,
0.3980, -0.2038, 1.5023, 0.3054, 0.8736, 1.8556, -1.3965, 1.0579,
-0.0868, -0.3515, -1.2344, -0.2689, 1.1425, 0.1928, 1.0721, -1.5331,
-0.2131, 1.0340, 1.6211, 0.2218, 1.7555, 0.3581, 2.6108, -0.1747,
0.1864, 0.0211],
[-2.1773, 0.4278, 0.2847, 0.4405, 0.9457, -0.1819, -0.3713, 1.0402,
-0.9497, -0.0645, 0.1729, -0.6848, 0.2156, -0.0078, 0.3848, -0.4249,
1.2975, -0.4167, 0.0660, 1.6326, -0.4543, 0.7339, 0.6010, 0.8946,
1.2881, -1.0936, 1.1421, -0.5225, 0.1843, -1.0033, 0.1155, -0.4692,
1.5356, 0.1045, -1.0899, 2.0136, 1.7887, 2.1656, -1.2265, -0.0519,
0.0472, 0.2626, -0.5554, 1.6628, -1.0357, 0.4898, 1.1277, 0.0699,
0.4967, 0.8722],
[ 0.7352, -0.6486, -0.6952, 2.6622, 0.2339, -0.0961, 1.8036, -0.3650,
-1.2539, -0.0111, 0.6007, -0.3418, -0.9551, 0.3020, 1.3864, -0.0676,
0.8362, 0.1694, -0.5506, 0.4202, 0.2058, -0.9739, 1.5484, -1.0143,
-0.7052, -0.2831, 0.7834, -3.0195, 0.3679, 0.9377, 0.0174, -1.4630,
-0.4082, -0.5332, 0.8701, -0.5404, 1.8485, -1.9600, -0.1757, -0.1020,
-0.1524, -1.8317, -0.0961, 1.1949, -1.2083, 1.7236, -0.2691, 0.9958,

and it has labels of size 9x1 like so,

[3, 3, 7, 0, 8, 3, 3, 3, 1],

between 0-8 (no one hot encoding)

which means each row of 50 dimensions in this image has a label. To summarize,
image

9x50 --> 9x1

and, we have a batch size of 128.

So, 128x9x50 maps to 128x9.

Since, each image has 9 labels, can we apply crossentropy? Am not sure how to use that in 2d though.
I want to take a max of probabilities on a single dimension (dim=1) of 128x9x9.

OR , do you suggest some other loss function? My labels are class indexes.

Regards

One possible approach would be to just reshape the rows of your images and the target into the batch dimension, so that your batches will be `batch_size*9`.
However, your input would be 1-dimensional afterwards.
Would this be a possibility or would you like to process the input somehow as image data?

1 Like

Thatâ€™s a good take for images but, let me tell you my real motive!
Each image, as we call it here, is actually a sentence and each row is a word.

Itâ€™s very important for my convolution layers to view a bunch of words together so that, it doesnâ€™t only train with the word and itâ€™s label but, the word context as well.

Thanks for the explanation.
You idea is interesting, as you could process the input as images, i.e. the conv kernels would see some words (rows) and the coding of the words (columns).
As each word has a target, I think we still should reshape the data and target at the end.
Here is a small code example with some comments:

``````class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.conv1 = nn.Conv2d(1, 3, 3, 1, 1)  # Use squared kernels
self.pool1 = nn.MaxPool2d((1, 2))  # Pool only in width (word embeddings)
self.conv2 = nn.Conv2d(3, 1, 3, 1, 1)
self.pool2 = nn.MaxPool2d((1, 2))

self.fc1 = nn.Linear(12, 9)

def forward(self, x):
x = F.relu(self.pool1(self.conv1(x)))
x = F.relu(self.pool2(self.conv2(x)))
x = x.permute(0, 2, 1, 3)  # change channels with height
x = x.view(x.size(0)*x.size(1), -1)
x = self.fc1(x)

return x

batch_size = 12
x = torch.randn(batch_size, 9, 50)
target = torch.randint(0, 9, (batch_size, 9))  # Assuming your target has 9 values

model = MyModel()
criterion = nn.CrossEntropyLoss()

for epoch in range(100):
output = model(x.unsqueeze(1))  # Unsqueeze to add channel dim
loss = criterion(output, target.view(-1))  # Flatten target
loss.backward()
optimizer.step()

print('Epoch {}, loss {}'.format(epoch, loss.item()))
``````

I would get back to you by tomorrow. Iâ€™ll try these post work hours!
Thanks

My apologies. I havenâ€™t yet been able to apply your idea and experiment as promised. Am surrounded a number of issues and I have to deliver a base draft within a deadline