How to setup a dataset for deeplabv3 training?

I’m attempting to train a custom deeplabv3 model, using custom data. I continually run into this error with my current setup for my training, dataset, loss etc script.

RuntimeError: stack expects each tensor to be equal size, but got [400, 640] at entry 0 and [400, 640, 3] at entry 17

It seems that my dataloader is trying to stack my masks and images together?

This is my dataset class:

class custom_data(
    def __init__(self, dataset_dir, transforms=None):
        self.data_dir = dataset_dir
        self.num_classes = 1
        self.paths = os.listdir(dataset_dir+"/images")
        self.masks = os.listdir(dataset_dir+"/masks")

        self.transform = transforms

    def __len__(self):
        return len(self.paths)

    def __getitem__(self, idx):
        img_path = self.paths[idx]
        mask_path = self.masks[idx]
        img = cv2.imread(self.data_dir+"/images/"+img_path)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        mask = cv2.imread(self.data_dir+"/masks/"+mask_path, cv2.IMREAD_UNCHANGED)
        mask = mask.astype(np.float32)

        if self.transform is not None:
            sample = self.transform(image=img, mask=mask)
            print("Error no transforms applied, tensor conversion is required!")     
        return sample["image"], sample["mask"]

I load my dataset, and split it into train and val. Then I produce a set of augmentations for train and val sets using albumnetations. I instantiate my dataloaders as follows:

dataloaders = {"train", batch_size=BATCH_SIZE, shuffle=True, drop_last=True, num_workers=10, pin_memory=True), 
                    "val", batch_size=BATCH_SIZE, shuffle=True, drop_last=True, num_workers=10, pin_memory=True)}

What could be causing this error? As far as I can tell my setup is very similar to ones I’ve found online i.e. albumentations_examples/pytorch_semantic_segmentation.ipynb at master · albumentations-team/albumentations_examples · GitHub

Could you check if any mask has 3 channels by printing its shape in the __getitem__?

Ah you are right, there was one mask that loaded as a 3 channel image instead of single channel. It was jpg not png like the others so I guess opencv just assumed it was colour.