Confused about the image preprocessing in classification

I am confused about the operation of subtracting mean in every image.

  1. If I want to finetune a network (pretrained on the ImageNet) on my dataset, should I subtract the ImageNet mean or my dataset mean? I think if I want to finetune on a new dataset, I should substract the new dataset mean, and if I just want to test on this new dataset, I should substract the ImageNet mean. Am I right?
  2. When calculate the dataset the mean and the std, should I just calculate it on train dataset, or on both train and test dataset?
  3. Should I calculate the std and the mean on the original image, or the resized image for training?
  4. How to calculate the image mean and std when the dataset is very large? It is easier to calculate the mean, I just need to operate the tensor. But how to calculating the std? I concatenate every tensor(after be viewed as (3, -1) shape), but it is not possible for a large dataset. It takes too much memory.
  5. In the pytorch implementation, the preprocessing is normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), but in the Caffe implementation, the mean is [123, 117, 104], I guess it because the image value in pytorch is [0, 1], and in Caffe is [0, 255]. Am I right? But where does the std = [0.229, 0.224, 0.225] come from? It doesn’t appear in the Caffe implementation. Is the std necessary? Why the Caffe implementation just normalizes the data using the mean but not the std?

Could anyone help figure it out? Thanks!


I’ll help where I can. I might be wrong on a few points.

  1. Ideally imagenet mean/std shouldn’t be too different from your mean/std. So doing anything should be fine. If you are freezing initial layers, you should use imagenet mean/std. Exact normalisation to (0, 1) mean/std is not required anymore because of batch norm.
  2. Just on train. It should be as if model never knew about test data.
  3. Again shouldn’t matter.
  4. Just calculate mean/std of pixels on a decently sized random sample of original images.
  5. About mean, you’re right. I’m not sure if caffe doesn’t have std.



Exact normalisation to (0, 1) mean/std is not required anymore because of batch norm.

Or even better yet, try using SELU.

@fanq15 There is a relevant discussion about your question here: Normalization in the mnist example

The mean/std used on Imagenet should be generally applicable unless you’re dealing with b/w images or something special. Then you can just use a subset of your training set (randomly sampled) to compute a new mean/std.


Thank you very much!

Thanks! It indeed works for my own dataset.

When I normalize the data with mean=(0,485, 0,456, 0,406) and std=(0,229, 0,224, 0,225) (Image Net) the values of the channels 0 and 2 are transformed to inf, is that right ?

Obs: the channel 1 maintains it’s values

Sorry for late.

As far as I know:

  1. If you want to use a pre-trained network to use on your own dataset, for training phase, you should use mean, std on new dataset, and if you only want to get the performance of the pre-trained network on you own dataset, I am not sure substract means and std of ImageNet will work. :confused:
  2. when calculate mean and std, only on train dataset is enough, because valid dataset and test dataset is used to mesure your model performance. And calculate on original images.
  3. I tried to calculate them on cpu, but it really cost so much and if the dataset is large, it will spend much time, but I think it is okay, because it only computed one time.
  4. transforms.normalize option is after torchvision.transforms.functional.to_tensor if the data convert to tensor, its value is [0,1].