PIL image and its normalisation


I am very new to machine learning in general, and just started with Pytorch because of it’s simplicity. So I am following the TRAINING A CLASSIFIER of 60 minutes blitz tutorial. There the I cannot understand how and what this lines mean:

The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].

transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Could you please help me understand it?

Thank you!

I think this might be useful for understanding this: https://discuss.pytorch.org/t/understanding-transform-normalize/21730/2?u=axki

Yes, that was helpful. But I am actually confused by what does PILImage of range [0, 1] mean. Could you please elaborate.

Actually, it’s not accurate. The pixel values are in range [0, 255] not [0, 1], When you open an image with PIL, you get an object of the following classes depending on JPG or PNG:

from PIL import Image

img1 = Image.open('filename.jpg')
<class 'PIL.JpegImagePlugin.JpegImageFile'>

img2 = Image.open('filename.png')
<class 'PIL.PngImagePlugin.PngImageFile'>

To see the pixel-values of these objects, you can use list(img1.getdata()) or np.asarray(img1) which will show the values [0, 255]:

>>> np.asarray(img1)[:2, :2]
array([[[255, 255, 255],
        [255, 255, 255]],

       [[255, 255, 255],
        [255, 255, 255]]], dtype=uint8)

The torchvision datasets return pixelvalues in range 0-255 as @vmirly1 says, so yeah, there seems to be a typo of some sort in the tutorial maybe?

The transforms.ToTensor() in your transform will convert it to range 0-1 (documentation: https://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.ToTensor)

The literal formulation “PILImage fo range [0, 1]” means that the individual pixel values are between those two values.

Hope this helped! :slight_smile:

Okay now everything makes sense. Thank you @vmirly1 @axki !