Passing the weights to CrossEntropyLoss correctly

daniel347x · June 4, 2023, 1:12pm

Question about the above, but in the case of soft targets (the target for each sample is a distribution over all C classes, not just a single class index):

I notice from the documentation for nn.CrossEntropyLoss that the noted normalization for the full batch is NOT performed in the case of the soft target version of this loss.

(I was drawn to this forum because I am getting results very heavily weighted towards the minority class when I use the same class weights for the soft target version and one-hot vectors for the targets, where it works as expected with hard targets, and I think I have it tracked down to the lack of normalization in the case of soft targets - batches that contain the minority class have much higher weights when backward is performed, i.e. there is effectively a much larger step taken, whereas with normalization for the batch in place, it is the direction of the step, but not its magnitude, that is impacted by the class weights.)

Note that performing the normalization seems straightforward for the case of soft targets - the normalization becomes the sum of [class weight] * [target probability] for each sample, rather than just [class weight of target class] for each sample.

Can you confirm that not performing normalization for the batch in the case of the ‘soft target’ version of nn.CrossEntropyLoss is a limitation and that my reasoning above is correct?

Thanks!