Normalization of input image

I am a beginner to pytorch here. As I read the tutorial, I always see such expression to normalization the input data.

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]) 

However, if I understand correctly, this step basically do

input[channel] = (input[channel] - mean[channel]) / std[channel]

according to the documentation.

So the question is, in order to normalize an input image, why we just say the mean and std is 0.5 and 0.5? Shouldn’t we first calculate the mean & std of one whole image and then use that value to normalize? I mean, it doesn’t have to be the case that every image has the same mean and std. Do I miss something here?

Any comments and idea are highly appreciated! Thank you!


Well, yeh. You have to compute it or find it. For imagenet you can just google those numbers but if you work with a custom dataset it’d be good you to compute it.

1 Like

Sorry, I am a beginner here. Maybe I am asking silly questions. Could you please tell me how to pass the sample-dependent mean and std to the transforms.Normalize(mean=, std=)]? Because what I find is all like assigning a fixed value for it. But how can I pass the specific data point and its np.mean() and np.std() to the transforms.Normalize?

Oh sorry for the confusion. It’s not sample-dependent but dataset-dependent. For example here you can find the parameters for imagenet. In short people computes an average of all the images of the dataset (assuming you are working with pictures and no other kind of data)

in lines 195-196 provided for torchvision models.

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])

You may find this interesting

1 Like

Oh, I see!
Yes, that’s my mistake. Thank you!

Hi. May I ask, how to define the mean value and std value for each image channel? My dataset consists of the natural staellite images, just use 0.5? Or compute the mean and std value for each channel of each image?
Moreover, can we set a parameter to make the CNN find the optimal parameter (mean value, std value or other weights/biases used in each channel) for the image processing? If so, can you tell me how to set the parameter?

You should compute mean and std for each channel for all the images in the dataset (or a representative subset if your dataset is very big).

Setting that preprocessing as a learnable parameter is not very practical. It’s just an statistical normalization.

In the worst case you can set everything to 0.5, which is an approximation

Thank you very much. I have tested the results between different image processing strageies, i.e. set all the mean and std value to 0.5, or compute mean and std for each channel for all the images in the dataset, the results indicate that the 0.5 setting is more conductive to the improvement of final accuracy.

should we always do the input image normalization?

It is strongly recommended. Networks may be able to fit any range of values but it’s been proved that normalization improves performance.


May I know how you are able to calculate the mean and standard deviation for the datasets?

First of all, if the dataset is “huge” you pick a representative subset.
Then you just need to iterate over that set and:
-Ideally you would save all the pixel values (per channel) and compute the mean and std of those.

Since this may be memory demanding you can just save the mean of each image (per channel). If all the images are the same size, it is equivalent to the previous formulation. Otherwise it’s an approximation.

Once you have the mean you can compute the std in the analogous way.