Error torch.nn.CrossEntropyLoss()

PatricYan · October 25, 2021, 3:50pm

loss = torch.nn.CrossEntropyLoss()
loss_values = loss(train_output, train_label)

train_output[128, 10, 27] , train_label[128, 10]

get error: Expected target size (128, 27), got torch.Size([128, 10])

ptrblck · October 25, 2021, 7:17pm

nn.CrossEntropyLoss expects a model output containing logits in the shape [batch_size, nb_classes, *] and a target containing class indices in the range [0, nb_classes-1] and the shape [batch_size, *].
The * denotes additional dimensions.
Assuming you are working with 27 classes, you would need to permute the model output such that the class dimension is in dim1.

PatricYan · October 26, 2021, 2:19am

need to change output [batch, seq, class] to [batch, class, seq], right? but how to change this?

ptrblck · October 26, 2021, 3:00am

This would work:

output = ... # shape [batch_size, seq_len, nb_classes]
output = output.permute(0, 2, 1).contiguous() # shape [batch_size, nb_classes, seq_len]

PatricYan · October 26, 2021, 3:19am

Hi
log_softmax = nn.LogSoftmax(dim=1) with data=[128, 10, 27], apply out=log_softmax(data), out.shape [128, 10, 27], but there are many 0s in out when data.shape [128, 1, 27],

ptrblck · October 26, 2021, 3:22am

This shape wouldn’t make sense, since you are trying to classify a single class using a multi-class classification loss. The zeros coming from log_softmax indicate a 1. probability for this single class.

PatricYan · October 26, 2021, 3:25am

data is [batch, sequence, class], it is batch 128, sequence length=1, class is 27, as what you have said, I need to change the data.shape [128, 1, 27] to data.shape [128, 27, 1] right?
that is to change [batch, sequence, class] to [batch, class, sequence], right?

ptrblck · October 26, 2021, 3:26am

Yes, you have to permute the model output to [batch_size, nb_classes, seq_len].

PatricYan · October 26, 2021, 3:26am

Thank you, I will try it now.

PatricYan · October 26, 2021, 3:36am

but after log_softmax = nn.LogSoftmax(dim=1) with data=[128, 27, 10], apply out=log_softmax(data),
I change it [128, 27, 10] to [128, 10, 27], by out.permute(0, 2, 1).contiguous() but get error:
AttributeError: ‘numpy.ndarray’ object has no attribute ‘permute’

I want to get the last sequence result that it is a length of 27 slice.

ptrblck · October 26, 2021, 4:26am

If the number of classes is 27, dim1 should have this size.
.permute is a tensor method, not a numpy.array method, so you should apply it on the model output tensor before passing it to the criterion.