PIL image and its normalisation

Kakar_Nyori · April 28, 2020, 10:10am

Hi!

I am very new to machine learning in general, and just started with Pytorch because of it’s simplicity. So I am following the TRAINING A CLASSIFIER of 60 minutes blitz tutorial. There the I cannot understand how and what this lines mean:

The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Could you please help me understand it?

Thank you!

axki · April 28, 2020, 10:13am

I think this might be useful for understanding this: https://discuss.pytorch.org/t/understanding-transform-normalize/21730/2?u=axki

Kakar_Nyori · April 28, 2020, 2:03pm

Yes, that was helpful. But I am actually confused by what does PILImage of range [0, 1] mean. Could you please elaborate.

vmirly1 · April 28, 2020, 2:18pm

Actually, it’s not accurate. The pixel values are in range [0, 255] not [0, 1], When you open an image with PIL, you get an object of the following classes depending on JPG or PNG:

from PIL import Image

img1 = Image.open('filename.jpg')
print(type(img1))
<class 'PIL.JpegImagePlugin.JpegImageFile'>

img2 = Image.open('filename.png')
print(type(img2))
<class 'PIL.PngImagePlugin.PngImageFile'>

To see the pixel-values of these objects, you can use list(img1.getdata()) or np.asarray(img1) which will show the values [0, 255]:

>>> np.asarray(img1)[:2, :2]
array([[[255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255]]], dtype=uint8)

axki · April 28, 2020, 2:38pm

The torchvision datasets return pixelvalues in range 0-255 as @vmirly1 says, so yeah, there seems to be a typo of some sort in the tutorial maybe?

The transforms.ToTensor() in your transform will convert it to range 0-1 (documentation: https://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.ToTensor)

The literal formulation “PILImage fo range [0, 1]” means that the individual pixel values are between those two values.

Hope this helped!

Kakar_Nyori · June 3, 2020, 5:22pm

Okay now everything makes sense. Thank you @vmirly1 @axki !