I’m using UNet implemented by pytorch for multimodal semantic segmentation. The size of original image is 512×512×3(RGB). After processing by DataLoader, the input size of the network is [1,3,512,512], and I know the first dimension is batchsize (I set the batchsize to 1). And the output size of the network is [1,2,512,512] since num_class = 2.
But I met a problem when computing CrossEntropyLoss. The error is ValueError: Expected input batch_size (1) to match target batch_size (3). I think this error has something to do with the mask. The size of mask is the same as the original image (512×512×3). I have tried this size in another UNet network implemented by keras for single-modal segmentation, and it works well. I try to process the mask using Convert(‘L’), and the error changes into ValueError: Expected input batch_size (1) to match target batch_size (512). I’m a beginner, and I’m so confused about it. How to fix this error?