I have 10 input color-images ([10, 3, 224, 224]), I want to convert to gray image in PyTorch…

torch.Size([10, 3, 224, 224]) --> torch.Size([10, 224, 224])

How can I do that, should I chose one channel from RGB or is there any function to convert RGB to GRAY…?

The size after turning into Gray isn’t [10,244,244] but [10,1,244,244]
You can do this easily with

torchvision.transforms. Grayscale ( num_output_channels=1 )

here is a little tutorial for this:

If you need the form [10,224,224] you cann use

1 Like

So the lazy way to do this is gray = img.mean(1) (but be careful when you have an alpha channel).
However, that isn’t a good way, as RGB are not equally bright. People have thought about this and came up with various weights. A great way to apply these weights is to carry out a pointwise convolution (that you also see in ResNets and friends to change the number of channels, here 3 channels in, one out, 1x1 pixel, use torch.nn.functional.conv_2d with a weight of shape 1, 3, 1, 1). But weighted average is not the end of the story either, there is gamma correction etc.

There are many more links at stack overflow, particularly noteworthy is this study.

Best regards


P.S.: @Filos92’s way works on PIL images not tensors, but uses whatever PIL provides for us (hopefully a good choice). The more PyTorch-y way to get rid of the singleton dimension would be img_gray.squeeze(1) rather than .view.

1 Like