[Solved]Dimention mismatch when do transforms on images and masks in segmentation task

MariosOreo · January 21, 2019, 12:04pm

I am working on semantic segmentation task and I have to make a customized dataset.
The images are 24-bit per pixel and the masks are 8-bit per pixel.
My customized dataset as follows:

lass MyDataset(Dataset):
    def __init__(self, root, set_name,):
        super(MyDataset, self).__init__()
        assert set_name == 'train' or set_name == 'val' or set_name == 'test'
        self.root = root
        self.set_name = set_name
        self.image_list = glob.glob(os.path.join(
            root,
            set_name,
            args.images_folder,
            "*.tif",
        ))
        self.label_list = glob.glob(os.path.join(
            root,
            set_name,
            args.labels_folder,
            "*.tif",
        ))

    def __getitem__(self, index):

        images = Image.open(self.image_list[index])
        masks = Image.open(self.label_list[index])
        t_images = TF.to_tensor(images)
        t_masks = TF.to_tensor(masks)
        return t_images, t_masks

    def __len__(self):

        return len(self.image_list)

But it will occur an error like this:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 1 in dimension 1 at /pytorch/aten/src/TH/generic/THTensorMath.cpp:3616

I think it because the masks only have a single channel so I change the code:

        images = Image.open(self.image_list[index]).convert('RGB')
        masks = Image.open(self.label_list[index]).convert('RGB')

Then it works, but I have to reshape the masks when I feed data and target to the network.
So I want to know is there any solution that can avoid change masks format when reading the dataset, I do not know whether it will have bad effects when I use operations above.

Thanks in advance

vmirly1 · January 21, 2019, 2:43pm

I don’t think this problem is related to the DataLoader. Because in this class, you are not mixing the two tensors t_images and t_masks. I think the problem happens later on in your code. For a test, can you just get the size of tensors loaded from the data_loader:

batch_x, batch_y = next(iter(data_loade))
print(batch_x.shape, batch_y.shape)

If this works, then we can confirm that there is no problem with MyDataset class that you have defined.

MariosOreo · January 22, 2019, 1:21am

Hi, @vmirly1 Thanks for your response.
I have tried the code you offered as follows :

    dst = MyDataset(root = args.dataset_root_dir, set_name="train")
    #print(len(dst)) 
    data_loader = DataLoader(dataset=dst, batch_size=args.batch_size, shuffle=False)
    batch_x, batch_y = next(iter(data_loader))
    print(batch_x.shape, batch_y.shape)

And the shape of data and target like this:

torch.Size([10, 3, 512, 512]) torch.Size([10, 1, 512, 512])

It confirms that MyDataset has no problem.
I found that the dimension mismatch error occurs when I do TF.vflip() and TF.hflip() transforms, and I solve the problem by this:

def transform(self, image, mask):
        # Resize
        resize = transforms.Resize(size=(args.input_size, args.input_size))
        image = resize(image)
        mask = resize(mask)
        # Random horizontal flipping
        if random.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(image)
        # Random vertical flipping
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)
        mask = TF.to_grayscale(mask)
        # Transform to Tensor
        image = TF.to_tensor(image)
        mask = TF.to_tensor(mask)
        image = TF.normalize(image, mean=[0.4353, 0.4452, 0.4131],
                                    std=[0.2044, 0.1924, 0.2013])
        return image, mask

It means I should read the images and masks in RGB and after transfroms the masks then convert to grayscale, and then I can get masks [batch_size, 1, height, width] in DataLoader. Is there better solutions?

Oh, I will go to re-edit the topic name. Thanks a lot

vmirly1 · January 22, 2019, 3:39am

Okay, so now we know that the MyDataset class is ok.

But now, why do you have this other transform() function? So have you removed the transformations in __getitem__, and instead you call his function inside __getitem__?

Also, youshouldn’t have to read the masks in RGB. You can directly read them as Grayscale.

MariosOreo · January 22, 2019, 4:00am

Oh yes, I removed the transformations in __getitem__, I defined a method transform in class MyDataset and call transform in __getitem__.

The code as follows:

class MyDataset(Dataset):
    def __init__(self, root,set_name,):
        super(MyDataset, self).__init__()
        assert set_name == 'train' or set_name == 'val' or set_name == 'test'
        self.root = root
        self.set_name = set_name
        self.mapping = {
            0: 0,
            255: 1
        }
        self.image_list = glob.glob(os.path.join(
            root,
            set_name,
            args.images_folder,
            "*.tif",
        ))
        self.label_list = glob.glob(os.path.join(
            root,
            set_name,
            args.labels_folder,
            "*.tif",
        ))

    def mask_to_class(self, mask):
        for k in self.mapping:
            mask[mask == k] = self.mapping[k]
        return mask

    def transform(self, image, mask):
        # Resize
        resize = transforms.Resize(size=(args.input_size, args.input_size))
        image = resize(image)
        mask = resize(mask)
        # Random horizontal flipping
        if random.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(image)
        # Random vertical flipping
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)
        # masks to gray_scale
        mask = TF.to_grayscale(mask)
        # Transform to Tensor
        image = TF.to_tensor(image)
        mask = TF.to_tensor(mask)
        mask = self.mask_to_class(mask)
        # Normalized only images? Yes
        image = TF.normalize(image, mean=[0.4353, 0.4452, 0.4131],
                             std=[0.2044, 0.1924, 0.2013])
        return image, mask

    def __getitem__(self, index):

        images = Image.open(self.image_list[index]).convert('RGB')
        masks = Image.open(self.label_list[index]).convert('RGB')
        t_images, t_masks = self.transform(images, masks)
        return t_images, t_masks

    def __len__(self):

        return len(self.image_list)

And test dataset and dataloader like this:

    dst = MyDataset(root = args.dataset_root_dir, set_name="train")
    #print(len(dst)) 
    data_loader = DataLoader(dataset=dst, batch_size=args.batch_size, shuffle=False)
    for i, (data, target) in enumerate(data_loader):
        print(i)
        print(data.shape)
        print(target.shape)
        break

data.shape is [batch_size, 3, height, width]
target.shape is [batch_size, 1, height, width]

But when I read masks as Grayscale, when do a vertical and horizontal flip there will be a dimension mismatch error.

How can I read the images as RGB and masks as Grayscale directly without these so many conversions?
Thank you very much !

vmirly1 · January 22, 2019, 4:21am

But I am able to read a gray-scale image as is, without needing to convert that to RGB:

>>> img = Image.open('img-gray.jpg')
>>> img_hflip = TF.hflip(img)
>>> img_tensor = TF.to_tensor(img_hflip)
>>> img_tensor.shape
torch.Size([1, 218, 178])

I still believe this error is related to things outside the dataloader.

MariosOreo · January 22, 2019, 5:39am

Okay, you are right. It doesn’t need convert to RGB to do some transforms, I removed convert('RGB') for images and masks, it still works, but when I removed

mask = TF.to_grayscale(mask)

in the transform method above. Trackback and error as follows:

Traceback (most recent call last):
  File "data_loader_test.py", line 16, in <module>
    for idx, (data, target) in enumerate(dataloader):
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 314, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 187, in default_collate
    return [default_collate(samples) for samples in transposed]
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 187, in <listcomp>
    return [default_collate(samples) for samples in transposed]
  File "/home/user/.local/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 164, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 1 and 3 in dimension 1 at /pytorch/aten/src
/TH/generic/THTensorMath.cpp:3616

Emm, masks shape as [1,512, 512], it seems that they are already grayscale? and don’t need to convert to ‘L’ or ‘P’? And the function TF.to_grayscale doesn’t make masks shape changed, so where the problems are …

MariosOreo · January 23, 2019, 1:27pm

Oh, it is too stupid.

In my method transforms, I made mistakes mask = resize(image), so that in the DataLoader the dimension mismatch error will occur in the default_collate, it is not the DataLoader’s problem.

My code as follows:

rgb_mean = (0.4353, 0.4452, 0.4131)
rgb_std = (0.2044, 0.1924, 0.2013)

class MyDataset(Dataset):
    def __init__(self,
                 config,
                 subset,
                 data_transforms= None,
                 target_transforms=None):
        super(MyDataset, self).__init__()
        assert subset == 'train' or subset == 'valid' or subset == 'test'
        self.config = config
        self.root = self.config.root_dir
        self.subset = subset
        self.data = self.config.data_folder_name
        self.target = self.config.target_folder_name

        self.data_transforms = data_transforms if data_transforms!=None else TF.to_tensor
        self.target_transforms = target_transforms if target_transforms!= None else TF.to_tensor

        self.mapping = {
            0: 0,
            255: 1,
        }
        self.data_list = glob.glob(os.path.join(
            self.root,
            subset,
            self.data,
            '*.tif'
        ))
        self.target_list = glob.glob(os.path.join(
            self.root,
            subset,
            self.target,
            '*.tif'
        ))
    def mask_to_class(self, mask):
        for k in self.mapping:
            mask[mask == k] = self.mapping[k]
        return mask
    def transfroms(self, image, mask):
        resize = transforms.Resize(size=(self.config.input_size, self.config.input_size))
        image = resize(image)
        mask = resize(mask)
        if ranom.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(mask)
        if random.random() > 0.5:
            image = TF.vflip(image)
            mask = TF.vflip(mask)
        image = TF.to_tensor(image)
        image = TF.normalize(image, mean=rgb_mean, std=rgb_std)
        mask = torch.from_numpy(np.array(mask, dtype=np.uint8))
        mask = self.mask_to_class(mask)
        mask = mask.long()
        return image, mask
    def __getitem__(self, index):
        datas = Image.open(self.data_list[index]).convert('RGB')
        targets = Image.open(self.target_list[index]).convert('P')
        t_datas, t_targets = self.transfroms(datas, targets)
        return t_datas, t_targets
    def __len__(self):
        return len(self.data_list)

And there are something to pay attention:

If you want to change grayscale mask to class, the Image method from PIL doesn’t work, we should read the masks in P or L
TF.to_tensor() will normalize images or masks to [0,1], so if change masks to class and tensor, we should convert mask to tensor by torch.from_numpy() and thus we can change masks to class and make them Tensor. Don’t foget convert the dtype to long.

Xiaoyu_Song · January 24, 2019, 6:06am

Hi, I’m trying to do the semantic segmentation too, could you show me your github?
I’m confused about the loss function

MariosOreo · January 24, 2019, 6:10am

Oh, sorry. :\
I have not completed the entir project, and after testing, there are something wrong in it, train phase on GPU is too slow, and I will upload my project to github, thanks for your attention. In addition, Im new in this domain

Xiaoyu_Song · January 24, 2019, 7:45am

I’m new too. keep going

MariosOreo · January 24, 2019, 11:58am

The same to you.