Pixel wise binary classification - which loss function to use?

WeiQin_Chuah · October 7, 2019, 10:13am

I would like my model to predict say K number of planes from an RGB image. After outputting a tensor (final layer is softmax) with size [B K+1 H W], with the +1 being the non-planar mask, I summed up all the planar masks. Therefore, after the summing operation, I will have [B 2 H W]. Also, I have ground-truth labels with 1 indicating planar regions and 0 indicating the nonplanar region.

Just wondering which loss function will be the best for this application?

Also, I’ve tried using cross entropy however the loss becomes nan after 2 iterations.

Thank you.

ptrblck · October 8, 2019, 12:56am

I’m not sure to fully understand your use case, however it seems you are dealing with a two-class classification (2 output channels and two class indices).
If your output has the shape [B, 2, H, W], your target should have the shape [B, H, W] and contain the class indices in the range [0, 1] to be able to use nn.CrossEntropyLoss.
Also, make sure to pass raw logits to this criterion, as internally F.log_softmax and nn.NLLLoss will be applied. If you are using a softmax activation as the last non-linearity, could you please remove it?

Could you check your inputs for invalid values (Inf or NaN)?
If you can’t find any invalid values, could you lower the learning rate a bit and check, if you get a NaN loss again?

WeiQin_Chuah · October 8, 2019, 1:34am

After some digging in PyTorch documentation, I found BCEloss which is cross entropy loss for binary classification. Shouldn’t I use that instead? I can repeat my target from [B, H, W] to [B, 2, H, W] so that it matches the shape of my output.
And by using BCEloss, I will not have to remove the last layer of cross entropy loss.

I tried using cross entropy loss as you suggested but I am getting another error:

RuntimeError: copy_if failed to synchronize: device-side assert triggered

ptrblck · October 8, 2019, 1:37am

nn.BCE(WithLogits)Loss would be the alternative and your output and target should then both have the shape of [B, 1, H, W].
I would recommend to use nn.BCEWithLogitsLoss for numerical stability and still remove the last activation layer (sigmoid for nn.BCELoss).