I am very new to machine learning in general, and just started with Pytorch because of it’s simplicity. So I am following the TRAINING A CLASSIFIER of 60 minutes blitz tutorial. There the I cannot understand how and what this lines mean:
The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].
transform = transforms.Compose(
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
Could you please help me understand it?
Yes, that was helpful. But I am actually confused by what does PILImage of range [0, 1] mean. Could you please elaborate.
Actually, it’s not accurate. The pixel values are in range
[0, 255] not
[0, 1], When you open an image with PIL, you get an object of the following classes depending on JPG or PNG:
from PIL import Image
img1 = Image.open('filename.jpg')
img2 = Image.open('filename.png')
To see the pixel-values of these objects, you can use
np.asarray(img1) which will show the values [0, 255]:
>>> np.asarray(img1)[:2, :2]
array([[[255, 255, 255],
[255, 255, 255]],
[[255, 255, 255],
[255, 255, 255]]], dtype=uint8)
The torchvision datasets return pixelvalues in range 0-255 as @vmirly1 says, so yeah, there seems to be a typo of some sort in the tutorial maybe?
The transforms.ToTensor() in your transform will convert it to range 0-1 (documentation: https://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.ToTensor)
The literal formulation “PILImage fo range [0, 1]” means that the individual pixel values are between those two values.
Hope this helped!
Okay now everything makes sense. Thank you @vmirly1 @axki !