So the lazy way to do this is gray = img.mean(1) (but be careful when you have an alpha channel).
However, that isn’t a good way, as RGB are not equally bright. People have thought about this and came up with various weights. A great way to apply these weights is to carry out a pointwise convolution (that you also see in ResNets and friends to change the number of channels, here 3 channels in, one out, 1x1 pixel, use torch.nn.functional.conv_2d with a weight of shape 1, 3, 1, 1). But weighted average is not the end of the story either, there is gamma correction etc.
P.S.: @Filos92’s way works on PIL images not tensors, but uses whatever PIL provides for us (hopefully a good choice). The more PyTorch-y way to get rid of the singleton dimension would be img_gray.squeeze(1) rather than .view.