Loaded images do have 3 channels despite using Grayscale transformation

talhak · August 16, 2019, 12:16am

As the title clearly describes, the images in the dataset I use do have 3 color channels despite they are grayscale. So, I used transforms.Grayscale(num_output_channels=1) transformation in order to reduce the number of color channels to 1, but still the loaded images do have 3 channels.

Here is my implementation:

data_transforms = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406],
                         std=[0.229, 0.224, 0.225])
])
face_train_dataset = datasets.ImageFolder(root=DEST_PATH_TRAIN, transform=data_transforms)

train_loader = DataLoader(face_train_dataset,
                          batch_size=train_batch_size, shuffle=True, num_workers=4)

ptrblck · August 16, 2019, 12:22am

If I run your code, I get the valid error:

RuntimeError: output with shape [1, 334, 690] doesn't match the broadcast shape [3, 334, 690]

which is thrown, since Grayscale returns a single-channel image, while Normalize uses three values.
Fixing the mean and std to a single value, yields valid output tensors of shape [batch_size, 1, 334, 690].

talhak · August 16, 2019, 12:30am

Not that I was aware of, thanks @ptrblck. Thought it could still work as it was expected.

talhak · August 16, 2019, 1:31am

Between, after decreasing the number of channels of the images, which were already grayscale (8 bits depth images), the accuracy of the network has decreased dramatically. The only change was that. Are there any workarounds or point that I miss? @ptrblck

ptrblck · August 16, 2019, 1:04pm

Are you sure the depth images are not stored in 16bit?

Anyway, are you training your model from scratch or finetuning a pretrained model?
Was the accuracy better when you loaded the depth images as 3-channel images?

talhak · August 16, 2019, 1:19pm

Yes, here is the detail of an image (which is a sample image from the FER2013 dataset):

depth

I have constructed my model from scratch, now trying to improve its accuracy. Yes, the accuracy was way better when I loaded the images as 3-channels as the accuracy has dropped to about 25% from 50%.