How to compute cross entropy loss for classification in tensor

I have N classes and my output of the convolution is in shape of
BxNxDxD, where B is the batch size, N is the number of classes, and D is the dimension of the out put.
I am trying re-implement ssd object detection.

so basically if i call my output Out, Out[0,:,0,0] is the classification results for position (0,0),
I made my GT to be in the same shape as Out, and i send Out to the Out = nn.Softmax2d(Out) , but i dont know how should i use the GT and Out for computing cross entropy loss at the end.

1 Like

Your target should have the dimensions [batch_size, height, width] filled with the corresponding class indices.
The docs on nn.CrossEntropyLoss give you an example.

Just flatten everything in one order, let’s say your final feature map is 7 x 7, batch size is 4, class number is 80. Then the output tensor should be 4 x 80 x 7 x 7. Here is the step to compute the loss:

# Flatten the batch size and 7x7 feature map to one dimension
out = out.permute(0, 2, 3, 1).contiguous().view(-1, class_numer) # size is 196 x 80
# What about the order? Why you did permute? And what the *ell is contiguous?
# Quick answer is: The order of dimension one is as expected, which is
# First row of the 7x7 feature map of first batch
# Second row of feature map of first batch
# Thrid row of first batch
# ...
# First row of second batch
# ...
# Last row of last batch

# You can refer to
# for how pytorch's `view` works, and why I put a permute and contiguous there

# Now let's regard the `out` as we 196 predictions 
# and for every prediction, we have 80 classes.
# And I believe that you can arrange the same order 
# targets from the ground truth, which should be
# a vector of 196 composed by real class numbers.
targets = tatgets_vector() # implement this function

# get loss, no need to do softmax
loss = F.cross_entropy(out, target)

I have re-implemented S3FD. It’s pretty like SSD, both are anchor(default box) based detector. Here is what I done for this

1 Like

Thank you!
I was always confuse why people do that!!

Just for the clarification, if i have several anchor boxes (lets say 2 for example), my prediction (and ground truth) will be like Bx2CxDxD instead of BxCxDxD, can I still use the aforementioned method to compute the loss?

Of course, just make sure the order of the ground truth and the order of the predictions is the same. The right way to reshape could be:

# if your output is B x 2 x C x D x D
output.permute(0, 3, 4, 1, 2).contiguous().view(-1, C)
1 Like

Thanks a lot! Very useful

I made those changes, but I found a new error.
Lets say 66 is the class number and feature maps where 8x8 = 64
after modifying the predictions and GTs I have:
input and target.


torch.Size([64, 66])
torch.Size([64, 66])

So I read a couple of other posts and realized that I should convert it to LongTensor.

target =target.type(torch.LongTensor).cuda()


torch.Size([64, 66])
torch.Size([64, 66])

but still when i want to compute the loss i face this:

loss = F.cross_entropy(input, target)

RuntimeError Traceback (most recent call last)
in ()
24 print(input.size())
25 print(target.size())
—> 26 loss = F.cross_entropy(input, target)

~/anaconda3/lib/python3.6/site-packages/torch/nn/ in cross_entropy(input, target, weight, size_average, ignore_index, reduce)
1440 >>> loss.backward()
1441 “”"
-> 1442 return nll_loss(log_softmax(input, 1), target, weight, size_average, ignore_index, reduce)

~/anaconda3/lib/python3.6/site-packages/torch/nn/ in nll_loss(input, target, weight, size_average, ignore_index, reduce)
1330 .format(input.size(0), target.size(0)))
1331 if dim == 2:
-> 1332 return torch._C._nn.nll_loss(input, target, weight, size_average, ignore_index, reduce)
1333 elif dim == 4:
1334 return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce)

RuntimeError: multi-target not supported at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THCUNN/generic/

Can you please let me know if you have any suggestion in this regard…

As I said, the target should be a vector of class number, not one hot encoded.
In your case, the target should be just 64, not (64, 66).

1 Like

target having the shape [batch_size, height, width] is requires for pixel wise classification (segmentation), right ?

For multi-class classification and e.g. nn.CrossEntropyLoss, the shape is right.


Given the data set your task is to predict the tags associated with a
Input - A dataframe with the
Target Column - tags
Evaluation Criteria - Cross Entropy Loss
Deliverables : A Jupyter notebook detailing the steps and approach. Design your
transformation/pipeline so that we can pass a test data frame to get the prediction.