DataLoader not reading in masks correctly?

So here we have to be careful about the objective and how to interpret the output. Are you training on a multilabel objective or a cross-entropy loss function? If it is cross-entropy, then doing pred > 0.5 doesn’t seem to make sense as you can end up with multiple classes being predicted True as you see here. If you are using code similar to the example I posted to end up with a single class (color) per pixel, then the input should be the raw outputs of the model; you don’t even need to do the sigmoid operation!

1 Like

Wow you have been so spot on. I am training on a cross-entropy loss function for multi-class segmentation. So instead of doing this:

pred = torch.sigmoid(model(x))
out = (pred > 0.5).float()

I should instead do something like:

out = (model(x)).float()

For my own understanding, would you mind explaining why this is the case? My original code was based off of binary segmentation and I didn’t understand it enough to correctly extend it to multi-class, that is why I am having trouble with these things. Thank you very much!

I think the model output should be some kind of floating point representation by default here so you probably don’t need the .float().

We can describe the difference with two segmentation examples. For example, if we have three classes [tree, branch, leaf], then a multilabel objective makes sense as both “branch” and “leaf” can be considered “tree” and it is useful to distinguish “branches” from “leaves.” On the other hand, if we have three classes [person, car, road], then it probably doesn’t make sense to use a multilabel objective as the knowledge that the classes are exclusive probably helps simplify the training task. For a binary objective we just want to get a yes/no answer for each class which is why we do something like pred > 0.5. On the other hand, for the cross-entropy case, we only want to get the maximum to see what the model thinks is the most likely. Just checking that the output is > 0.5 isn’t enough as the predictions for multiple classes could meet this requirement.

Thanks for the explanation. So I guess multi-class is the way to go when the classes are exclusive and independent of each other, otherwise maybe multi-label. Nonetheless multi-class makes the most sense for my application.

So I took your advice and now I simply do out = model(x) and this is what out looks like:

out: tensor([[[[ 0.0603,  0.0611,  0.0599,  ...,  0.0619,  0.0622,  0.0633],
          [ 0.0587,  0.0588,  0.0576,  ...,  0.0594,  0.0602,  0.0617],
          [ 0.0596,  0.0602,  0.0584,  ...,  0.0585,  0.0586,  0.0604],
          ...,
          [ 0.0595,  0.0581,  0.0577,  ...,  0.0571,  0.0570,  0.0580],
          [ 0.0606,  0.0591,  0.0586,  ...,  0.0596,  0.0583,  0.0587],
          [ 0.0600,  0.0592,  0.0585,  ...,  0.0581,  0.0570,  0.0597]],

         [[ 0.0936,  0.0909,  0.0931,  ...,  0.0901,  0.0928,  0.0901],
          [ 0.0913,  0.0897,  0.0896,  ...,  0.0874,  0.0919,  0.0909],
          [ 0.0905,  0.0874,  0.0892,  ...,  0.0853,  0.0926,  0.0925],
          ...,
          [ 0.0899,  0.0900,  0.0890,  ...,  0.0887,  0.0917,  0.0910],
          [ 0.0909,  0.0896,  0.0917,  ...,  0.0888,  0.0917,  0.0896],
          [ 0.0918,  0.0914,  0.0916,  ...,  0.0911,  0.0926,  0.0914]],

         [[-0.0600, -0.0611, -0.0591,  ..., -0.0603, -0.0585, -0.0607],
          [-0.0616, -0.0619, -0.0589,  ..., -0.0609, -0.0566, -0.0591],
          [-0.0613, -0.0608, -0.0588,  ..., -0.0609, -0.0576, -0.0588],
          ...,
          [-0.0613, -0.0618, -0.0602,  ..., -0.0609, -0.0585, -0.0592],
          [-0.0612, -0.0615, -0.0602,  ..., -0.0612, -0.0591, -0.0603],
          [-0.0599, -0.0587, -0.0590,  ..., -0.0586, -0.0561, -0.0584]]]],
       device='cuda:0')

Here is the prediction image:

And here are the predictions for classes 0, 1, and 2, saved as images respectively:

So the resulting output is pretty much class 1 everywhere.

How many epochs are you training for and what is the trend of the training loss? You might want to decrease the learning rate and train for a while to see if your model can overfit a single training image.

1 Like

Yeah, so as you know I am training using 1 image and testing using 1 image. The learning rate is 0.0001, batch size is 1, and the number of epochs is 1. Let me increase the epochs to 100.

Training update: The loss started off as 1 on the first epoch and it’s been going down after each epoch. Not sure which epoch it is on right now (definitely before 50 though) but it is at 0.3 loss so far.

Update

Prediction image:

Class images (0, 1, and 2):

This looks way more reasonable. I think you helped me solve my issue. Now I just need to get more training data, increase the number of epochs, and be patient! Thank you very much you have been a phenomenal help!

1 Like