Loaded images do have 3 channels despite using Grayscale transformation

As the title clearly describes, the images in the dataset I use do have 3 color channels despite they are grayscale. So, I used transforms.Grayscale(num_output_channels=1) transformation in order to reduce the number of color channels to 1, but still the loaded images do have 3 channels.

Here is my implementation:

data_transforms = transforms.Compose([
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
face_train_dataset = datasets.ImageFolder(root=DEST_PATH_TRAIN, transform=data_transforms)

train_loader = DataLoader(face_train_dataset,
                          batch_size=train_batch_size, shuffle=True, num_workers=4)

If I run your code, I get the valid error:

RuntimeError: output with shape [1, 334, 690] doesn't match the broadcast shape [3, 334, 690]

which is thrown, since Grayscale returns a single-channel image, while Normalize uses three values.
Fixing the mean and std to a single value, yields valid output tensors of shape [batch_size, 1, 334, 690].

1 Like

Not that I was aware of, thanks @ptrblck. Thought it could still work as it was expected.

Between, after decreasing the number of channels of the images, which were already grayscale (8 bits depth images), the accuracy of the network has decreased dramatically. The only change was that. Are there any workarounds or point that I miss? @ptrblck

Are you sure the depth images are not stored in 16bit?

Anyway, are you training your model from scratch or finetuning a pretrained model?
Was the accuracy better when you loaded the depth images as 3-channel images?

Yes, here is the detail of an image (which is a sample image from the FER2013 dataset):


I have constructed my model from scratch, now trying to improve its accuracy. Yes, the accuracy was way better when I loaded the images as 3-channels as the accuracy has dropped to about 25% from 50%.