Understanding transform.Normalize( )

Hi all,

I am trying to understand the values that we pass to the transform.Normalize, for example the very seen ((0.5,0.5,0.5),(0.5,0.5,0.5)).

Is that the distribution we want our channels to follow? Or is that the mean and the variance we want to use to perform the normalization operation?

If the latter, after that step we should get values in the range[-1,1]. Is this for the CNN to perform better? If we want to visualize, however, one sample image on matplotlib, we need to perform the required transformation, right?

Is there a way I can get my values in the range [0,1]? Will that reduce the performance of my CNN?

This is the code I am using to plot a sample image, in case it helps someone.

# 0 - Pre-define tranformations
# -----------------------------

transform = transforms.Compose([
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) 
"""Convert a color image to grayscale and normalize the color range to [0,1]."""

# 1 - Loading the dataset
# -----------------------

data_path = os.path.join(os.getcwd(), 'datasets')
#dataset = 'Fashion MNIST'
dataset = 'SVHN'

if dataset == 'Fashion MNIST':
    root = os.path.join(data_path, 'FMNIST')
    train_dataset = FMNIST(root = root, download = True, train = True, transform = transform)
    test_dataset  = FMNIST(root = root, download = False, train = False, transform = transform)

if dataset == 'SVHN':
    root = os.path.join(data_path, 'SVNH')
    train_dataset = SVHN(root = root, download = True, split = 'train', transform = transform)
    test_dataset  = SVHN(root = root, download = True, split = 'test', transform = transform)

train_loader = data.DataLoader(dataset = train_dataset,
                               batch_size = batch_size,
                               shuffle = True)

test_loader = data.DataLoader(dataset = train_dataset,
                               batch_size = batch_size,
                               shuffle = True)

# Grab a sample image
idx = random.randint(0,10)
tensor = train_dataset.__getitem__(idx)[0]
image = np.squeeze(tensor.numpy())
image = (image - np.min(image)) / (np.max(image) - np.min(image))
image = image.transpose((1, 2, 0))

Normalize does the following for each channel:

image = (image - mean) / std

The parameters mean, std are passed as 0.5, 0.5 in your case. This will normalize the image in the range [-1,1]. For example, the minimum value 0 will be converted to (0-0.5)/0.5=-1, the maximum value of 1 will be converted to (1-0.5)/0.5=1.

if you would like to get your image back in [0,1] range, you could use,

image = ((image * std) + mean)

About whether it helps CNN to learn better, I’m not sure. But majority of the papers I read employ some normalization schema. What you are following is one of them.

Hope it helps.


To answer above question, Yes. Normalization does helps CNN perform better.
Normalization helps get data within a range and reduces the skewness which helps learn faster and better


Is there a sequence order in the transforms.Compose operation? The mean value of my image is generally in the range of [127.5, 127.5, 127.5], which is also written as transforms.Compose([transforms.Normalize(([127.5,127.5,127.5]),[127.5,127.5,127.5]]))?


There are three parameters ((0.5,0.5,0.5),(0.5,0.5,0.5)) and written twice. I recently started python with deep learning so its confusing me. As you mentioned it is defined as mean and std. then it should two mention it as (0.5, 0.5). why we have (0.5,0.5,0.5)? Whats the third 0.5 shows? and secondly why we have these values twice?


If you read the documentation here, you will see that both parameters are “Sequences for each channel”. Color images have three channels (red, green, blue), therefore you need three parameters to normalize each channel. The first tuple (0.5, 0.5, 0.5) is the mean for all three channels and the second (0.5, 0.5, 0.5) is the standard deviation for all three channels.


But why [-1,1] when the transformation was already applied on a normalized set of [0,1]?


this is very well explained by @InnovArul above Understanding transform.Normalize( )
It depends which normalization method are you using.
Using normalization transform mentioned above will transform dataset into normalized range [-1, 1]
If dataset is already in range [0, 1] and normalized, you can choose to skip the normalization in transformation.
You can choose to normalize and get data in range [0, 1] by tweaking mean and std in transform

In my shallow view, normalization and scale are two different data preprocessing.
Scale is used to scale your data to [0, 1]
But normalization is to normalize your data distribution for training easily.

import torchvision.transforms.functional as TF
image = torch.randint(0,255,(5, 5, 3), dtype=torch.uint8)
scaled_image = TF.to_tensor(np.asarray(image))
tensor([[[0.2078, 0.3765, 0.9451],
         [0.2039, 0.3961, 0.5176],
         [0.2588, 0.5333, 0.2039]],

        [[0.0941, 0.8980, 0.6745],
         [0.2431, 0.7451, 0.1255],
         [0.5412, 0.4667, 0.2471]],

        [[0.2000, 0.8588, 0.6902],
         [0.1137, 0.1255, 0.2000],
         [0.6863, 0.2392, 0.2118]]])
normalized_image = TF.normalize(image, mean, var)
tensor([[[-0.5843, -0.2471,  0.8902],
         [-0.5922, -0.2078,  0.0353],
         [-0.4824,  0.0667, -0.5922]],

        [[-0.8118,  0.7961,  0.3490],
         [-0.5137,  0.4902, -0.7490],
         [ 0.0824, -0.0667, -0.5059]],

        [[-0.6000,  0.7176,  0.3804],
         [-0.7725, -0.7490, -0.6000],
         [ 0.3725, -0.5216, -0.5765]]])

If I am wrong, please correct me.
Thanks in advance.


@MariosOreo you are correct.
Scale and Normalization are different.
Scale only states that data will be within given range.
Normalization states data is proportionate within given range.


So how to define the mean value and std value? Are there some suggestions?
Moreover, can we set a parameter to make the CNN find the optimal parameter for the image processing? If so, can you tell me how to set the parameter?

May I ask, how to define the mean value and std value of each image channel? Are there some suggestions?
Moreover, can we set a parameter to make the CNN find the optimal parameter for the image processing? If so, can you tell me how to set the parameter?

What if the image is grey scale?


The link below might help you.

1 Like

As I understood from several resources the normalization setting below taken from imagenet but I also wonder the intuition behind it.

transforms.Normalize(mean=[0.485, 0.456, 0.406],
                     std=[0.229, 0.224, 0.225]) 

And for the images with pixel values between [0-1] such normalization may ruin the image as I experienced, I may be wrong though.


For image tensors with values in [0, 1] this transformation will standardize it, so that the mean of the data should be ~0 and the std ~1.
This is also known as Standard score or z-score in the literature, and usually helps your training.


Thank you very much for the information.
I should admit that it is my first week to start on pytorch and I found this forums extremely valuable learning source.

I have a toy data-set to classify dog images when I perform normalization as mentioned above and without changing any other settings on data loaders

dataiter = iter(load_data['train'])
images, labels = dataiter.next()
images = images.numpy() 
fig = plt.figure(figsize=(20, 4))
for idx in np.arange(10):
    ax = fig.add_subplot(2, 10/2, idx+1, xticks=[], yticks=[])
    plt.imshow(np.transpose(images[idx], (1, 2, 0)))

I get this result as shown in the image

When I delete the normalization it plots normal dog images. When I print the tensor I can see values are positive between [0-1] for input data, after normalization they become between [-1 1]
I do not know if it’s the error of normalization or matplotlib snippet ?
torch version is 0.41 python 3.5

1 Like

The messy output is quite normal, as matplotlib either slips the input or tries to scale it, which creates these kind of artifacts (also because you are normalizing channel-wise with different values).

If you would like to visualize the images, you should use the raw images (in [0, 255]) or the normalized ones (in [0, 1]).
Alternatively, you could also unnormalize them, but I think the first approach would be simpler.

If you are using a custom Dataset, just add another load_data function and use it for visualization:

class MyDataset(Dataset):
    def __init__(self, image_paths, targets, transform=None):
        self.image_paths = image_paths
        self.targets = targets
        self.transform = transform

    def load_image(self, index):
        image_path = self.image_paths[index]
        img = Image.open(image_path)
        return img

    def __getitem__(self, index):
        x = self.load_image(index)
        y = self.targets[index]
        if self.transform:
            x = self.transform(x)
        return x, y
    def __len__(self):
        return len(self.image_paths)

image_paths = [...]
targets = ...
dataset = MyDataset(image_paths, targets, transform=transforms.ToTensor())
img_to_vis = dataset.load_image(index=0)

PS: Unrelated to your question, but your PyTorch version is quite old. I would recommend to update it to the latest stable version. You’ll find the install instructions here.


@bhushans23 what do u mean when u say proportionate in given range?
Thank you in advance!

If you are starting with range 0-255 .png images, do you first need to convert to 0-1 and some other image format before utilizing transforms.normalize()? Or can I just transform these as-is with means/stds more like transforms.Normalize((120,120,120),(30,30,30))?