So the question is, in order to normalize an input image, why we just say the mean and std is 0.5 and 0.5? Shouldn’t we first calculate the mean & std of one whole image and then use that value to normalize? I mean, it doesn’t have to be the case that every image has the same mean and std. Do I miss something here?

Any comments and idea are highly appreciated! Thank you!

Well, yeh. You have to compute it or find it. For imagenet you can just google those numbers but if you work with a custom dataset it’d be good you to compute it.

Sorry, I am a beginner here. Maybe I am asking silly questions. Could you please tell me how to pass the sample-dependent mean and std to the transforms.Normalize(mean=, std=)]? Because what I find is all like assigning a fixed value for it. But how can I pass the specific data point and its np.mean() and np.std() to the transforms.Normalize?

Oh sorry for the confusion. It’s not sample-dependent but dataset-dependent. For example here you can find the parameters for imagenet. In short people computes an average of all the images of the dataset (assuming you are working with pictures and no other kind of data)

Hi. May I ask, how to define the mean value and std value for each image channel? My dataset consists of the natural staellite images, just use 0.5? Or compute the mean and std value for each channel of each image?
Moreover, can we set a parameter to make the CNN find the optimal parameter (mean value, std value or other weights/biases used in each channel) for the image processing? If so, can you tell me how to set the parameter?

Thank you very much. I have tested the results between different image processing strageies, i.e. set all the mean and std value to 0.5, or compute mean and std for each channel for all the images in the dataset, the results indicate that the 0.5 setting is more conductive to the improvement of final accuracy.

Hi,
First of all, if the dataset is “huge” you pick a representative subset.
Then you just need to iterate over that set and:
-Ideally you would save all the pixel values (per channel) and compute the mean and std of those.

Since this may be memory demanding you can just save the mean of each image (per channel). If all the images are the same size, it is equivalent to the previous formulation. Otherwise it’s an approximation.

Once you have the mean you can compute the std in the analogous way.