Multiclass classification with nn.CrossEntropyLoss

The documentation for nn.CrossEntropyLoss states

The input is expected to contain scores for each class.
input has to be a 2D Tensor of size (minibatch, C).
This criterion expects a class index (0 to C-1) as the target for each value of a 1D tensor of size minibatch

However the following code appears to work:

loss = nn.CrossEntropyLoss()
input = torch.randn(15, 3, 10)
input = Variable(input, requires_grad=True)
target = torch.LongTensor(15,10).random_(3)
target = Variable(target)
output = loss(input, target)

So my input is a 3D tensor and my targets is a 2D tensor.

Am I right in thinking that CrossEntropyLoss is interpreting my input as minibatch, N_CLASSES, SEQ_LEN and my targets as minibatch, SEQ_LEN?

The reason I am trying to do this is that I am doing multiclass classification, each element of my minibatch is a sequence of 10 elements which can be classified into one of 3 classes.

1 Like

It seems you are right.
I tested it with this small example:

loss = nn.CrossEntropyLoss(reduce=False)
input = torch.randn(2, 3, 4)
input = Variable(input, requires_grad=True)
target = torch.LongTensor(2,4).random_(3)
target = Variable(target)
output = loss(input, target)

loss1 = F.cross_entropy(input[0:1, :, 0], target[0, 0])
loss2 = F.cross_entropy(input[0:1, :, 1], target[0, 1])

loss1 and loss2 give the first two elements of output, so apparently it’s working.

1 Like

Hi Vellamike,

I still don’t get why you would want to do that.
CrossEntropy simply compares the scores that your model outputs against a one-hot encoded vector where the 1 is in the index corresponding to the true label.

If your input are sequences of length 10, then you need to build a model that accepts 10 inputs and apply a tranformation into 3 outputs, which will be your feature vector or scores for that classes. Then you can apply softmax to normalize them.

Now is when you use CELoss to compute the diferrence between the output of your model and the true labels.

I hope it helps.


@PabloRR100 Sorry for not answering earlier - I just saw your reply.

The reason I want to do that is that is I am doing a sequence-to-sequence network. My labels are sequences themselves - I have one label per sample in my sequence. So each sequence does not fall into one of 3 classes, each element of the sequence falls into one of three classes. So for a sequence of length 10 I have a rank two tensor (dim 3x10) - you can think of this as 10 one hot encoded vectors of length 3.

1 Like