I am trying to implement one-hot encoding for MNIST imported from Kaggle. The shape of the encoding is [1, 10] but when the loss function runs, it throws the following error:

ValueError: Expected input batch_size (10) to match target batch_size (256).

For a classification use case you would most likely use a loss function like nn.NLLLoss or nn.CrossEntropyLoss, which both expect a target tensor containing class indices instead of a one-hot encoded representation.
How are you currently trying to calculate your loss?
Could you post some code so that we could have a look?

I spotted in the cifar-10 tutorial on the Pytorch website that the output layer had 10 units while the label was a scalar. Seeing that example, I made one-hot vector as label and a output layer of 10 units but that didn’t work. Anyways, can you please help me understand how cross entropy loss is calculated in the cifar-10 example?

Sure, the model in the tutorial outputs 10 logits in its last linear layer. As you can see, no non-linearity was used on this layer, so that the values represent the raw logits for all 10 classes.
If you call softmax on them, you would get the probabilities for each class, but we don’t want to do that now.
Our label tensor contains just the current class index for each sample.
So, if your current sample belongs to class2, we just use the index 2 in the label instead of a representation like [0, 0, 1, 0, 0, 0, 0, 0, 0, 0].

The shape of your model output should be [batch_size, number_of_classes], while label should be [batch_size].

Here is a small example just showing how to use the criterion:

batch_size = 5
nb_classes = 10
model = nn.Linear(20, nb_classes)
x = torch.randn(batch_size, 20)
target = torch.empty(batch_size, dtype=torch.long).random_(nb_classes)
criterion = nn.CrossEntropyLoss()
output = model(x)
loss = criterion(output, target)

hey, I have a similar question regarding cross entropy, but for multi-class prediction. My output of network, the logits has shape of [5,6,256,256]. Number of classes is 6 here. The target has a shape of [5,256,256]. Since this criterion combines LogSoftMax and ClassNLLCriterion in one single class, cross entropy expects logits and target having different size, right? At least, criterion = nn.CrossEntropyLoss() loss = criterion(logit, true_masks) didn’t give me error.

And, I have another question regarding to the loss value of cross entropy. My initial values are larger than 1. According to some sanity checks post out there, it says that it is about bad initialization of weights. My activation function is leaky_relu. So, I used He distribution to initialize the weights. For example like this, self.down = nn.Sequential( nn.ZeroPad2d(1), nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=scale, stride=scale)) self.down[1].weight = nn.Parameter( nn.init.kaiming_normal_(torch.empty((out_channels, in_channels,scale,scale),dtype=torch.float32), nonlinearity='leaky_relu'),requires_grad = True)
I compared to the case of using the default initialization. Using He distribution gives me even larger initial losses. My validation accuracy is lower than 50% for first 30 epochs which I have to keep exploring, since the data is complex data. Can you please give me any hints? Thank u.

Yes, the shapes look good. The target should contain values in the range [0, 5], which seems to be the case.

I don’t think a loss value smaller than 1 is expected.
For a 6-class classification with a random accuracy of 1/6=0.167, the initial loss would be in the range -ln(1/6.) = 1.79 as described in the CS231n notes.
(Karpathy used CIFAR10 and thus used the random accuracy of 0.1)

Hello,
I am curious about this topic as well, not necessarily vision but a classification.

I am using BCELoss() my prediction has shape torch.Size([1,32,1]) and my labels were torch.Size([32]). Which was resulting in a UserWarning: Using a target size (torch.Size([32])) that is different to the input size (torch.Size([1, 32, 1])) is deprecated.

Does this effect the performance of my model? Should I reshape my labels or my predictions to feed into the loss function?

I have a similar problem but in regression CNN I extract the output from 4D tensor to calculate the loss with 1D target but the problem now that both have a different sizes or different length

how can I solve this problem, I have different sizes between outputs(from CNN) and the targets , should I feel the size difference with zeros or that will effect the loss calculation?!

Padding tensors with zeros will affect the loss calculation and you would have to think about the actual loss calculation and what it should represent.
Since the model output and targets have different shapes you won’t be able to directly calculate the loss with e.g. MSELoss so could you explain what the shapes in both tensors represent?

actually, my task is to build a model able to predict the borders of images.
so I started with cropping the borders(with fixed random int between 5,15) and keep them as targets (I resized all images before the cropping to 90*90)
and I extract also a mask of the same size with zeroes on the borders.
then I build collate fun. to stack the batches and passed the cropped image, masks (their shape will be [batch_size, 1, 90,90] and the targets [batch_size, max_tar_len in batch]
finally, I pass the cropped images and I got the output with the same shape then I multiply this output
by the masks then I extract the number != 1 when I loop over each batch to compare the shapes (target, output) I found there’re different as I mentioned in the last comment.