Proper input to Loss Function (CrossEntropy / NLL)

blindelephants · October 6, 2018, 12:37pm

If I have a tensor that is of shape [96, 16, 160] that is the output from a model I’m trying to train, and my targets are in a tensor of shape [96, 16, 1] (where there are 160 different classes, hence the appearance of 160 in the first, and 1 in the second), what’s the proper method for putting these two tensors into a loss function?

Should I just use .view(-1, 160) and .view(-1, 1)?

blindelephants · October 6, 2018, 12:54pm

Oops, just realized that .view(-1,1) will not work for target values input into CrossEntropyLoss function. It returns an Error and states that multiclass classification is not supported. So…

Would this be the proper approach: loss =
lossFunction(output.view(-1, 160), targets.view(-1))

ptrblck · October 6, 2018, 4:42pm

Depending what dim0 and dim1 are representing, this might be a valid approach.
I assume dim0 is the batch size, so what is dim1?
nn.CrossEntropyLoss takes an input of shape [N, C, additional dims] and a target of [N, additional dims].

Could you explain the dimensions so that we can reshape them if necessary?

blindelephants · October 6, 2018, 4:45pm

The dimensions are [seq_len, batch, features] as per the documentation re: RNN input. Although I’m still a little unclear on the proper usage of seq_len as well, or rather, how this is treated internally.

Thanks,

ptrblck · October 6, 2018, 4:49pm

features stands here for out class logits, i.e. you have 160 classes?
If so, you could view the seq_len and batch_size together, in case you want a prediction for each seq timestamp:

x = torch.randn(96, 16, 160)
y = torch.empty(96, 16, dtype=torch.long).random_(160)

criterion = nn.CrossEntropyLoss()
loss = criterion(x.view(-1, 160), y.view(-1))

blindelephants · October 6, 2018, 4:51pm

Great, thank you. This helps alot, I really appreciate it.