Combining CrossEntropyLoss with MSEloss

I’m currently working on a semantic segmentation problem where I want to classify every pixel in my input image (256X256) to one of 256 classes. I currently use the CrossEntropyLoss and it works OK.

in my specific problem, the 0-255 class numbers also have the property that mistaking between 5 and 6, for instance, is not as “bad” as mistaking 5 and 200. meaning, mistaking “close” classes is not as bad as mistaking “far” classes. thus, I thought of adding a second loss to my system, an L2 loss, meaning MSEloss.

however, the output of my network for input\label of size (Batch X 256 X 256) is (Batch X 256 X 256 X 256), and so I can’t use MSEloss(out, label) directly. also, as I understand it, taking the argmax on the first dim and then using the MSEloss will render it undifferentiable.

is there any way to get around it?
thanks in advance…

1 Like

One way of incorporating an underlying metric into the distance of probability measures is to use the Wasserstein distance as the loss - cross entropy loss is the KL divergence - not quite a distance but almost - between the prediction probabilities and the (one-hot distribution given by the labels) A pytorch implementation and a link to Frogner et al’s paper is linked below.

An alternative could be to use the expected square error loss per pixel. EDIT: The notebook doean’t solve the per pixel problem. For that you might use a Rubner style ( approach to have label as a third dimension, or you might just use MSE…
Best regards


1 Like