[Solved]Dimention mismatch when do transforms on images and masks in segmentation task

I am working on semantic segmentation task and I have to make a customized dataset.
The images are 24-bit per pixel and the masks are 8-bit per pixel.
My customized dataset as follows:

lass MyDataset(Dataset):
    def __init__(self, root, set_name,):
        super(MyDataset, self).__init__()
        assert set_name == 'train' or set_name == 'val' or set_name == 'test'
        self.root = root
        self.set_name = set_name
        self.image_list = glob.glob(os.path.join(
            root,
            set_name,
            args.images_folder,
            "*.tif",
        ))
        self.label_list = glob.glob(os.path.join(
            root,
            set_name,
            args.labels_folder,
            "*.tif",
        ))

    def __getitem__(self, index):

        images = Image.open(self.image_list[index])
        masks = Image.open(self.label_list[index])
        t_images = TF.to_tensor(images)
        t_masks = TF.to_tensor(masks)
        return t_images, t_masks

    def __len__(self):

        return len(self.image_list)

But it will occur an error like this:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 1 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMath.cpp:3616

I think it because the masks only have a single channel so I change the code:

        images = Image.open(self.image_list[index]).convert('RGB')
        masks = Image.open(self.label_list[index]).convert('RGB')

Then it works, but I have to reshape the masks when I feed data and target to the network.
So I want to know is there any solution that can avoid change masks format when reading the dataset, I do not know whether it will have bad effects when I use operations above.

Thanks in advance :smiley:

I don’t think this problem is related to the DataLoader. Because in this class, you are not mixing the two tensors t_images and t_masks. I think the problem happens later on in your code. For a test, can you just get the size of tensors loaded from the data_loader:

batch_x, batch_y = next(iter(data_loade))
print(batch_x.shape, batch_y.shape)

If this works, then we can confirm that there is no problem with MyDataset class that you have defined.

1 Like

Hi, @vmirly1 Thanks for your response.
I have tried the code you offered as follows :

    dst = MyDataset(root = args.dataset_root_dir, set_name="train")
    #print(len(dst)) 
    data_loader = DataLoader(dataset=dst, batch_size=args.batch_size, shuffle=False)
    batch_x, batch_y = next(iter(data_loader))
    print(batch_x.shape, batch_y.shape)

And the shape of data and target like this:

torch.Size([10, 3, 512, 512]) torch.Size([10, 1, 512, 512])

It confirms that MyDataset has no problem.
I found that the dimension mismatch error occurs when I do TF.vflip() and TF.hflip() transforms, and I solve the problem by this:

def transform(self, image, mask):
        # Resize
        resize = transforms.Resize(size=(args.input_size, args.input_size))
        image = resize(image)
        mask = resize(mask)
        # Random horizontal flipping
        if random.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(image)
        # Random vertical flipping
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)
        mask = TF.to_grayscale(mask)
        # Transform to Tensor
        image = TF.to_tensor(image)
        mask = TF.to_tensor(mask)
        image = TF.normalize(image, mean=[0.4353, 0.4452, 0.4131],
                                    std=[0.2044, 0.1924, 0.2013])
        return image, mask

It means I should read the images and masks in RGB and after transfroms the masks then convert to grayscale, and then I can get masks [batch_size, 1, height, width] in DataLoader. Is there better solutions?

Oh, I will go to re-edit the topic name. Thanks a lot :smiley:

Okay, so now we know that the MyDataset class is ok.

But now, why do you have this other transform() function? So have you removed the transformations in __getitem__, and instead you call his function inside __getitem__?

Also, youshouldn’t have to read the masks in RGB. You can directly read them as Grayscale.

Oh yes, I removed the transformations in __getitem__, I defined a method transform in class MyDataset and call transform in __getitem__.

The code as follows:

class MyDataset(Dataset):
    def __init__(self, root,set_name,):
        super(MyDataset, self).__init__()
        assert set_name == 'train' or set_name == 'val' or set_name == 'test'
        self.root = root
        self.set_name = set_name
        self.mapping = {
            0: 0,
            255: 1
        }
        self.image_list = glob.glob(os.path.join(
            root,
            set_name,
            args.images_folder,
            "*.tif",
        ))
        self.label_list = glob.glob(os.path.join(
            root,
            set_name,
            args.labels_folder,
            "*.tif",
        ))

    def mask_to_class(self, mask):
        for k in self.mapping:
            mask[mask == k] = self.mapping[k]
        return mask

    def transform(self, image, mask):
        # Resize
        resize = transforms.Resize(size=(args.input_size, args.input_size))
        image = resize(image)
        mask = resize(mask)
        # Random horizontal flipping
        if random.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(image)
        # Random vertical flipping
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)
        # masks to gray_scale
        mask = TF.to_grayscale(mask)
        # Transform to Tensor
        image = TF.to_tensor(image)
        mask = TF.to_tensor(mask)
        mask = self.mask_to_class(mask)
        # Normalized only images? Yes
        image = TF.normalize(image, mean=[0.4353, 0.4452, 0.4131],
                             std=[0.2044, 0.1924, 0.2013])
        return image, mask

    def __getitem__(self, index):

        images = Image.open(self.image_list[index]).convert('RGB')
        masks = Image.open(self.label_list[index]).convert('RGB')
        t_images, t_masks = self.transform(images, masks)
        return t_images, t_masks

    def __len__(self):

        return len(self.image_list)

And test dataset and dataloader like this:

    dst = MyDataset(root = args.dataset_root_dir, set_name="train")
    #print(len(dst)) 
    data_loader = DataLoader(dataset=dst, batch_size=args.batch_size, shuffle=False)
    for i, (data, target) in enumerate(data_loader):
        print(i)
        print(data.shape)
        print(target.shape)
        break

data.shape is [batch_size, 3, height, width]
target.shape is [batch_size, 1, height, width]

But when I read masks as Grayscale, when do a vertical and horizontal flip there will be a dimension mismatch error.

How can I read the images as RGB and masks as Grayscale directly without these so many conversions?
Thank you very much ! :blush:

But I am able to read a gray-scale image as is, without needing to convert that to RGB:

>>> img = Image.open('img-gray.jpg')
>>> img_hflip = TF.hflip(img)
>>> img_tensor = TF.to_tensor(img_hflip)
>>> img_tensor.shape
torch.Size([1, 218, 178])

I still believe this error is related to things outside the dataloader.

Okay, you are right. It doesn’t need convert to RGB to do some transforms, I removed convert('RGB') for images and masks, it still works, but when I removed

mask = TF.to_grayscale(mask)

in the transform method above. Trackback and error as follows:

Traceback (most recent call last):
  File "data_loader_test.py", line 16, in <module>
    for idx, (data, target) in enumerate(dataloader):
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 314, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 187, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 187, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 164, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1 and 3 in dimension 1 at /pytorch/aten/src
/TH/generic/THTensorMath.cpp:3616

Emm, masks shape as [1,512, 512], it seems that they are already grayscale? and don’t need to convert to ‘L’ or ‘P’? And the function TF.to_grayscale doesn’t make masks shape changed, so where the problems are …
:disappointed_relieved::disappointed_relieved::disappointed_relieved:

Oh, it is too stupid.

In my method transforms, I made mistakes mask = resize(image), so that in the DataLoader the dimension mismatch error will occur in the default_collate, it is not the DataLoader’s problem.

My code as follows:

rgb_mean = (0.4353, 0.4452, 0.4131)
rgb_std = (0.2044, 0.1924, 0.2013)

class MyDataset(Dataset):
    def __init__(self,
                 config,
                 subset,
                 data_transforms= None,
                 target_transforms=None):
        super(MyDataset, self).__init__()
        assert subset == 'train' or subset == 'valid' or subset == 'test'
        self.config = config
        self.root = self.config.root_dir
        self.subset = subset
        self.data = self.config.data_folder_name
        self.target = self.config.target_folder_name

        self.data_transforms = data_transforms if data_transforms!=None else TF.to_tensor
        self.target_transforms = target_transforms if target_transforms!= None else TF.to_tensor

        self.mapping = {
            0: 0,
            255: 1,
        }
        self.data_list = glob.glob(os.path.join(
            self.root,
            subset,
            self.data,
            '*.tif'
        ))
        self.target_list = glob.glob(os.path.join(
            self.root,
            subset,
            self.target,
            '*.tif'
        ))
    def mask_to_class(self, mask):
        for k in self.mapping:
            mask[mask == k] = self.mapping[k]
        return mask
    def transfroms(self, image, mask):
        resize = transforms.Resize(size=(self.config.input_size, self.config.input_size))
        image = resize(image)
        mask = resize(mask)
        if ranom.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(mask)
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)
        image = TF.to_tensor(image)
        image = TF.normalize(image, mean=rgb_mean, std=rgb_std)
        mask = torch.from_numpy(np.array(mask, dtype=np.uint8))
        mask = self.mask_to_class(mask)
        mask = mask.long()
        return image, mask
    def __getitem__(self, index):
        datas = Image.open(self.data_list[index]).convert('RGB')
        targets = Image.open(self.target_list[index]).convert('P')
        t_datas, t_targets = self.transfroms(datas, targets)
        return t_datas, t_targets
    def __len__(self):
        return len(self.data_list)

And there are something to pay attention:

  • If you want to change grayscale mask to class, the Image method from PIL doesn’t work, we should read the masks in P or L
  • TF.to_tensor() will normalize images or masks to [0,1], so if change masks to class and tensor, we should convert mask to tensor by torch.from_numpy() and thus we can change masks to class and make them Tensor. Don’t foget convert the dtype to long.

Hi, I’m trying to do the semantic segmentation too, could you show me your github?
I’m confused about the loss function

Oh, sorry. :\
I have not completed the entir project, and after testing, there are something wrong in it, train phase on GPU is too slow, and I will upload my project to github, thanks for your attention. In addition, Im new in this domain :thinking:

I’m new too.:smile: keep going

The same to you. :wink: