RuntimeError: output with shape doesn't match the broadcast shape

I want to train RGB images on ResNet50 using transfer learning method.
I was resized images to 224x224 and I got error using GPU:

RuntimeError: output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]

On CPU all works fine.

How to deal with it?

Could you post the line of code throwing this error?
It seems strange that your code is running fine on the CPU.

I get error on filepath line and image = self.transform

    def __getitem__(self, index):
        """Generate one sample of data."""
        filepath = os.path.join(self.root + "/img/" + self.filenames[index])
        bbox = self.bbox[index]
        image = cv2.imread(filepath, cv2.IMREAD_COLOR)
        image = image[bbox[1]:bbox[3], bbox[0]:bbox[2]]
        image = cv2.copyMakeBorder(image, bbox[1], bbox[3], bbox[0], bbox[2], cv2.BORDER_CONSTANT, value=[255, 255, 255])
        attribute_label = self.attributes[index]
        #category_label = self.categories[index]

        if self.transform is not None:
            image = self.transform(image)

        return image, attribute_label

This line of code should also throw this error if you run the code in the CPU, so that’s a bit strange.
Anyway, how did you define self.transform?

My transform:

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize(size=(224, 224)),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

Could you print the shape of the image before passing it to the transformation.
I’m not sure, if OpenCV creates pseudo-color channels if you are loading a grayscale image, but if not that could yield this error.
If that’s the case, you could extend your tensor in that dimension to create 3 identical color channels.

Few examples:

(432, 512, 3)
(346, 212, 3)
(600, 596, 3)
(298, 454, 3)
(472, 348, 3)
(516, 380, 3)
(458, 362, 3)
(530, 598, 3)
(478, 384, 3)

@ptrblck Here is all errors

This error was usually thrown, if you’ve passed a grayscale image to the normalization with three values.
Could you add a check to see, if all images contain three channels?

I was checked, and I don’t have images with single channels…

Let me clarify, if the img has three channels, you should have three number for mean, just for example, img is RGB, mean is [0.5, 0.5, 0.5], the normalize result is R * 0.5, G * 0.5, B * 0.5. If img is gray type that only one channel, so mean should be [0.5], the normalize result is R * 0.5

1 Like

I’m working on Ubuntu 16.04 version and using torch version 1.3.1
It’s that MNIST data set consists of grey images.

transform = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5)), ])

can be changed to

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])