For crossentropy loss function why output and target size cannot be the same

Leonardo_Ma · May 23, 2019, 12:56pm

I’m learning to use PyTorch to solve a multi-item, multi-feature, time sequence prediction problem.
In brief, my question is why the size of output and target of crossentropy loss function cannot be the same.
For instance, size of output is (batch_size, num_items), in which each element is a value fitted to the ground true class. Like matrix A:
[[ 0.5 2.1 4.8 3.2 ]
[ 5.0 4.3 2.7 0.2 ]
[ 3.7 0.3 2.0 1.5 ]]
(each row is one batch, and each column is for an item.)
Accordingly, size of the target is also (batch_size, num_items), like matrix B:
[[ 1 0 2 1 ]
[ 0 3 2 1 ]
[ 1 1 1 0 ]]
I suppose the most suitable loss function for my model in pytorch should be crossentropy(one of the pointwise methods?), but if that’s not true, please correct me.
Based on the Doc of pytorch loss functions, Input(output of model): (N, C) where C = number of classes, or (N, C, d_1, d_2, …, d_K) in the case of K-dimensional loss.
Target: (N) where each value is 0:C−1, or (N, d_1, d_2, …, d_K) in the case of K-dimensional loss.
How can I transform or squeeze the shape of my output and target to fit the requirement of crossentropy loss function? Thanks a lot.

ptrblck · May 23, 2019, 2:30pm

nn.CrossEntropyLoss is used for multi-class classification use cases, i.e. each sample belongs to a single target class.
In your example, num_items in the output would correspond to the number of classes.
Each row would give you the logits of this sample to belong to the class at column x.
Based on this, the target should only contain the target class indices, which are used to calculate the cross entropy loss.

If your targets might take arbitrary values for each element of your output, you might want to use some loss function like nn.MSELoss.

Leonardo_Ma · May 23, 2019, 2:45pm

Hi there, thank you for your quick and detailed reply.
Actually, each element in the output, namely A(i, j), is a single sample. So num_items is not number of classes, but just another dimension of samples like batch_size. The value of A(i, j) is the output of a linear layer, expected to be as close as possible to the ground true value of class, which is B(i, j). In the target B, element values are indeed within (0, C-1) strictly.
So I think it can be seen as a multi-class classification problem. The issue is to correctly classify each element in output A, compared to target B. Problem is output A does not have num_classes dimension, can I just expand A with classes dimension in some way?

ptrblck · May 23, 2019, 2:52pm

In that case your output should have the shape [batch_size, nb_classes, num_items], which could be seen as a temporal signal, where each sample contains the class logits for nb_classes.
nn.CrossEntropyLoss would then maximize the output of the current neuron which corresponds to the target class.
Would that be possible using your model?

Leonardo_Ma · May 23, 2019, 3:15pm

You mean in the last layer, make the output in shape of [batch_size, nb_classes, num_items]?
That’s possible, let the dense layer to fill dimension of classes. Here in this model, num_classes is about 2000~3000, and the input features of model is just around 100, would that be a problem? Another question is how do I transform the output into same shape of the target, cause the final result need it to be.
Actually the exact value of output is not important in this example. If the target is [0 1 2 3], then output [0 2 5 8] or [3 4 9 21] would be equal to the final result. In this sense, it’s a Learning to Rank problem. I just want to compare nn.CrossEntropyLoss to the other pointwise, pairwise, especially listwise loss functions.

Leonardo_Ma · May 23, 2019, 3:19pm

If you know any better listwise loss functions for LtoR problem in PyTorch, please let me know. I searched the document quite a few times, nn.CrossEntropyLoss is the best I can find… Thank you very much.