Input and target size mismatch

I am trying to implement one-hot encoding for MNIST imported from Kaggle. The shape of the encoding is [1, 10] but when the loss function runs, it throws the following error:

ValueError: Expected input batch_size (10) to match target batch_size (256).

My mini batch-size is 256.
What should I do?

For a classification use case you would most likely use a loss function like nn.NLLLoss or nn.CrossEntropyLoss, which both expect a target tensor containing class indices instead of a one-hot encoded representation.
How are you currently trying to calculate your loss?
Could you post some code so that we could have a look?

I spotted in the cifar-10 tutorial on the Pytorch website that the output layer had 10 units while the label was a scalar. Seeing that example, I made one-hot vector as label and a output layer of 10 units but that didn’t work. Anyways, can you please help me understand how cross entropy loss is calculated in the cifar-10 example?

Sure, the model in the tutorial outputs 10 logits in its last linear layer. As you can see, no non-linearity was used on this layer, so that the values represent the raw logits for all 10 classes.
If you call softmax on them, you would get the probabilities for each class, but we don’t want to do that now.
Our label tensor contains just the current class index for each sample.
So, if your current sample belongs to class2, we just use the index 2 in the label instead of a representation like [0, 0, 1, 0, 0, 0, 0, 0, 0, 0].

The shape of your model output should be [batch_size, number_of_classes], while label should be [batch_size].

Here is a small example just showing how to use the criterion:

batch_size = 5
nb_classes = 10
model = nn.Linear(20, nb_classes)

x = torch.randn(batch_size, 20)
target = torch.empty(batch_size, dtype=torch.long).random_(nb_classes)

criterion = nn.CrossEntropyLoss()

output = model(x)
loss = criterion(output, target)

Thanks a lot. I was really stuck on this one

1 Like

hey, I have a similar question regarding cross entropy, but for multi-class prediction. My output of network, the logits has shape of [5,6,256,256]. Number of classes is 6 here. The target has a shape of [5,256,256]. Since this criterion combines LogSoftMax and ClassNLLCriterion in one single class, cross entropy expects logits and target having different size, right? At least, criterion = nn.CrossEntropyLoss() loss = criterion(logit, true_masks) didn’t give me error.

And, I have another question regarding to the loss value of cross entropy. My initial values are larger than 1. According to some sanity checks post out there, it says that it is about bad initialization of weights. My activation function is leaky_relu. So, I used He distribution to initialize the weights. For example like this,
self.down = nn.Sequential( nn.ZeroPad2d(1), nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=scale, stride=scale)) self.down[1].weight = nn.Parameter( nn.init.kaiming_normal_(torch.empty((out_channels, in_channels,scale,scale),dtype=torch.float32), nonlinearity='leaky_relu'),requires_grad = True)
I compared to the case of using the default initialization. Using He distribution gives me even larger initial losses. My validation accuracy is lower than 50% for first 30 epochs which I have to keep exploring, since the data is complex data. Can you please give me any hints? Thank u.

Yes, the shapes look good. The target should contain values in the range [0, 5], which seems to be the case.

I don’t think a loss value smaller than 1 is expected.
For a 6-class classification with a random accuracy of 1/6=0.167, the initial loss would be in the range -ln(1/6.) = 1.79 as described in the CS231n notes.
(Karpathy used CIFAR10 and thus used the random accuracy of 0.1)

hey, Thanks for your quick reply. I should have checked the equation behind cross entropy. Then, my initial loss values are roughly in normal range.

I am curious about this topic as well, not necessarily vision but a classification.

I am using BCELoss() my prediction has shape torch.Size([1,32,1]) and my labels were torch.Size([32]). Which was resulting in a UserWarning: Using a target size (torch.Size([32])) that is different to the input size (torch.Size([1, 32, 1])) is deprecated.

Does this effect the performance of my model? Should I reshape my labels or my predictions to feed into the loss function?


@raceee It wont, for now (it is a warning only).

But it would be better for future compatibility if you reshape your prediction.

You can use pred.squeeze() to remove the redundant dims.

1 Like

Ok does it matter which one I reshape? As to say, if I am getting this warning should I reshape the preds or the labels? Or does it really not matter?

The labels are good. The preds should be of same shape. You can also change the shape of labels BTW.

P.S. - While BCELoss does support any shape as long it is same across target and output, CrossEntropy only allows 1D tensors.

1 Like

Great I’ll get the preds into the same shape as the labels.

Thanks for your help!

1 Like

I have a similar problem but in regression CNN I extract the output from 4D tensor to calculate the loss with 1D target but the problem now that both have a different sizes or different length

0,outputs torch.Size([2484]), targets torch.Size([3690])
1,outputs torch.Size([3270]), targets torch.Size([3884])

how can I solve this problem, I have different sizes between outputs(from CNN) and the targets , should I feel the size difference with zeros or that will effect the loss calculation?!

Padding tensors with zeros will affect the loss calculation and you would have to think about the actual loss calculation and what it should represent.
Since the model output and targets have different shapes you won’t be able to directly calculate the loss with e.g. MSELoss so could you explain what the shapes in both tensors represent?

actually, my task is to build a model able to predict the borders of images.
so I started with cropping the borders(with fixed random int between 5,15) and keep them as targets (I resized all images before the cropping to 90*90)
and I extract also a mask of the same size with zeroes on the borders.
then I build collate fun. to stack the batches and passed the cropped image, masks (their shape will be [batch_size, 1, 90,90] and the targets [batch_size, max_tar_len in batch]
finally, I pass the cropped images and I got the output with the same shape then I multiply this output
by the masks then I extract the number != 1 when I loop over each batch to compare the shapes (target, output) I found there’re different as I mentioned in the last comment.