Tensor shapes for loss_functions: crossEntropyLoss and SmoothL1Loss

For the training of my project I use two models, which means I have to outputs.
They use the following loss_fn:

ModelA: nn.CrossEntropyLoss()
ModelB: nn.SmoothL1Loss(reduction='mean')

I am a bit confused about their needed tensor_shapes.
Just wanted to reassure:
loss_A_fn will get:
predictedTensor: [BATCH, VALUE] like [8,1]
labelTensor: [VALUE, VALUE, …] like [8]

1) Is that right?

I think this points into this direction: Looking for a cross entropy loss that accepts two tensors of the same shape

But my loss_B_fn will need on both [8,1] and [8,1] (pred, label).
2) Is that right?

If yes:
Since my labels are all hot(?) Tensors [8] I reshape it with:
distance_label_dl = distance_label_dl.reshape(distance_label_dl.shape[0], 1)
3) Is that okay, or shall I rather use another (better) way?

4) Do I have to do this within my train_loop or shall I do that within the CustomDataset?

I think the main question is what your use case is and what your targets represent.
nn.CrossEntropyLoss is used for multi-class classification/segmentation use cases and expects class labels by default. However, in newer PyTorch releases you can also pass “soft” targets to it representing probabilities in [0, 1] in the same shape as the model output.
nn.SmoothL1Loss might be used for regression use cases, but I doubt it’s a good loss function for a classification use case.
You could of course one-hot encode your target (i.e. all classes are set to zero while the one active class is set to one), which might satisfy the shape requirements of both loss functions, but might not make much sense for the actual model training.

@ptrblck
Okay. Maybe the numbers are a bit confusing.
Let’s say the batch_size is: 6.
The Class_categories are: 9 [0, 1, 2, 3, 4, 5, 6, 7, 8]. → (0: Car, 1: Cyclist, 2: Person, …).
regeression_labels: anything between 0–99.

So regarding to my questions

ModelA: nn.CrossEntropyLoss() → Is used for Classification.
ModelB: nn.SmoothL1Loss(reduction='mean') → Is used for regression.

So what I am doing right now:
Classification; nn.CrossEntropyLoss(): yc_predicted is tensor [batch number, class_categories]
label_tensor (shape): [6] So there is no direct batch_number included anymore.
Values look like: [0, 0, 3, 7, 4, 8]

So here it’s ok to feed the nn.CrossEntropyLoss() with ([6,1], [6]). → (predicted_tensor, target_tensor).
1) Is that right?

But in contrast the loss_B for the regression (with nn.SmoothL1Loss(reduction='mean') needs other shapes:
loss_smoothL1([6,1] , [6,1]) → (predicted_tensor, target_tensor).
2) Is that right?

If yes:
Since my labels are all Tensors with only one single-dim I reshape it with:
distance_label_dl = distance_label_dl.reshape(distance_label_dl.shape[0], 1)

So from a tensor which first looks like shape [6]: [3, 83, 32, 57, 67, 22] I make a tensor with shape [6, 1]:

[[3], 
[83], 
[32], 
[57], 
[67], 
[22]]

distance_label_dl = distance_label_dl.reshape(distance_label_dl.shape[0], 1)
3) Is that okay, or shall I rather use another (better) way?

4) Do I have to do this reshpae within my train_loop or shall I do that within the CustomDataset?

Yes to 1. and 2., which is also explained in the docs for both loss functions.

  1. Yes, reshape can be used to add an additional dimension, but you could also use the more explicit .unsqueeze(1) operation.

  2. I would add these ops into the Dataset.__getitem__ and make sure all tensors are already in the expected shape inside the training loop.

1 Like