As the title clearly describes, the images in the dataset I use do have 3 color channels despite they are grayscale. So, I used
transforms.Grayscale(num_output_channels=1) transformation in order to reduce the number of color channels to 1, but still the loaded images do have 3 channels.
Here is my implementation:
data_transforms = transforms.Compose([
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
face_train_dataset = datasets.ImageFolder(root=DEST_PATH_TRAIN, transform=data_transforms)
train_loader = DataLoader(face_train_dataset,
batch_size=train_batch_size, shuffle=True, num_workers=4)
If I run your code, I get the valid error:
RuntimeError: output with shape [1, 334, 690] doesn't match the broadcast shape [3, 334, 690]
which is thrown, since
Grayscale returns a single-channel image, while
Normalize uses three values.
std to a single value, yields valid output tensors of shape
[batch_size, 1, 334, 690].
Not that I was aware of, thanks @ptrblck. Thought it could still work as it was expected.
Between, after decreasing the number of channels of the images, which were already grayscale (8 bits depth images), the accuracy of the network has decreased dramatically. The only change was that. Are there any workarounds or point that I miss? @ptrblck
Are you sure the depth images are not stored in 16bit?
Anyway, are you training your model from scratch or finetuning a pretrained model?
Was the accuracy better when you loaded the depth images as 3-channel images?
Yes, here is the detail of an image (which is a sample image from the FER2013 dataset):
I have constructed my model from scratch, now trying to improve its accuracy. Yes, the accuracy was way better when I loaded the images as 3-channels as the accuracy has dropped to about 25% from 50%.