ValueError about batchsize while doing semantic segmentation


I’m using UNet implemented by pytorch for multimodal semantic segmentation. The size of original image is 512×512×3(RGB). After processing by DataLoader, the input size of the network is [1,3,512,512], and I know the first dimension is batchsize (I set the batchsize to 1). And the output size of the network is [1,2,512,512] since num_class = 2.

But I met a problem when computing CrossEntropyLoss. The error is ValueError: Expected input batch_size (1) to match target batch_size (3). I think this error has something to do with the mask. The size of mask is the same as the original image (512×512×3). I have tried this size in another UNet network implemented by keras for single-modal segmentation, and it works well. I try to process the mask using Convert(‘L’), and the error changes into ValueError: Expected input batch_size (1) to match target batch_size (512). I’m a beginner, and I’m so confused about it. How to fix this error?

Thank you.

I assume you are using nn.CrossEntropyLoss for your segmentation use case.
If that’s the case, then you would have to pass the target mask in the shape [batch_size, height, width] as a LongTensor containing the class indices in the range [0, nb_classes-1].

Based on the shape of your mask, it seems it might be an RGB image?
If so, you would need to map the colors to class indices first.