You should use the normalization technique that has been used for the training. Since ImageNet pretrained models used the ImageNet stats, you should use that if you plan to use a model that has been trained on ImageNet.

If finetuning means you alter the number of classes for the classification model, or you are using some semantic segmentation on an ImageNet pretrained backbone, then you should still use this ImageNet stats.

Let’s take your first example of altering the number of classes. Since the weights on the head of the network is not trained (initialized randomly), shouldn’t gradient descent be smart to modify the parameters in those new layers to adapt accordingly.

Do you use any ImageNet pretrained network? If answer is Yes, then you should normalize your input images with ImageNet stats.

If answer is No, then you should not use ImageNet stats, instead you should use the stats based on your training images.

Let’s say you have 10.000 training images and 1.000 validation images.
You may calculate the mean and standard deviation for your 10.000 images for all 3 channels if you have 3 channels and you should use that for both the training and validation set to normalize the images.

It won’t be wrong to use all 11.000 images and to calculate the mean and std and use that.
In fact it would be even better.

But once you get these numbers you need to stick with them.

@dejanbatanjac, Please help me understand this. "It won’t be wrong to use all 11.000 images and to calculate the mean and std and use that. In fact it would be even better."
Till now I have seen people only using the mean and std of train set only. How will doing that for the entire dataset work?
Thanks in advance.