Stack expects each tensor to be equal size, but got [200, 200, 3] at entry 0 and [200, 200] at entry 10

Hi,
I am trying to make an image classification model. To open the image I am using Image.open(img_path).
But I am getting this error:

stack expects each tensor to be equal size, but got [200, 200, 3] at entry 0 and [200, 200] at entry 10.

But when I use CV2 to open the image it works fine, like this:

img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

Why isn’t it working with PIL backend?
Here is my Dataset class:

class HolidayDataset(Dataset):
    def __init__(self, train_path, df, transform=None):
        self.train_path = train_path
        self.df = df
        self.transform = transform
        
    def __len__(self):
        return len(self.df)
    
    def __getitem__(self, idx):
        img_name = str(self.df.Image[idx])
        target = str(self.df.Class[idx])
        img_path = self.train_path + img_name
        img = Image.open(img_path)
        img = np.array(img)/255
        if target == "Miscellaneous":
            target = 0
        elif target == "Christmas_Tree":
            target = 1
        elif target == "Jacket":
            target = 2
        elif target == "Candle":
            target = 3
        elif target == "Airplane":
            target = 4
        elif target == "Snowman":
            target = 5
        if self.transform:
            img = self.transform(image=img)["image"]
        return {
            "img": torch.tensor(img, dtype=torch.float),
            "target": torch.tensor(target, dtype=torch.long)
        }

It would be great if someone could help!
Thanks :slight_smile:

PIL would check the image format, in particular the number of channels, and will return 3 channels for RGB images and will remove the channel dimension for grayscale images, which is the case for the image in the shape [200, 200].
You could use Image.open(path).convert('RGB') to transform the grayscale images into the RGB format.

1 Like

Thanks a lot, got it!