It seems that torchvision Grayscale uses the following formula to convert RGB images to grayscale:
L = R * 0.2989 + G * 0.5870 + B * 0.1140
Some other packages like opencv use pretty similar color conversion values.
These ratios to merge RGB into one channel come from the BT.601 standard. And so far as I understand, this standard was created for television and is based on the two major concepts:
- Human eye feels some colors better than others. For example, we can see more colors in the green color spectrum than in the red one.
- Different color representations in different TV systems. Specified RGB ratios are used in PAL/NTSC standard, but there are TV standards with the other ratios.
Torchvision Grayscale is typically used in deep learning, where we take information from camera sensors (i.e. images) and process it to get non-visual results. And (surprise!) none of these is related neither to the human eye, nor to ancient TV standards.
Every time we convert digital camera images from RGB to grayscale, we lose some information. And the more distortion we introduce to conversion algorithms, the more original information we lose.
Itâs pretty obvious that applying any arbitrary coefficients while merging color channels leads to irreversible image data loss. And no doubt that standards that are classified as âperceptual luminance-preserving conversionâ are pretty arbitrary for most of the Machine Learning tasks.
So why should these RGB ratios still be a thing in Torch? Why not just manage all the colors equally?