In documentation it says that we should use the same normalization as used for the ImageNet images, i.e.
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
But what if my dataset images have a slightly different mean and std and I want to train a pretrained (on ImageNet) model on my dataset?
Should I use my own normalization or the above one from ImageNet?
Does it really matter which one to use if the differences in mean and std are small?
If you want to simply use the pretrained model on your images (although that would imply having the ImageNet classes as output), then normalize your images with the ImageNet stats.
However, if you want to finetune a pretrained model using your images as training data, then I think it is best to normalize using your stats.
I have had experience with large differences between two datasets, which had enormous impact, but I don’t know about very small differences.
Thanks, that makes sense. And yes, I want to fine tune the pretrained model (for which I changed the classifier to my own with less classes).
Another question: to compute the mean and std, should I use the raw images or the transformed with the same transforms that I am going to use for the training dataset (such as RandomRotation, ColorJitter etc.)?
The problem is that due to the randomness in such transformations, the mean and the std will not be the same at each training iteration. On the other hand, if I use the mean and the std of the raw images, during the training the model sees transformed images which have different mean and std from the raw images.
Right, good question.
My first intuition would be to compute them on the raw images, because the transforms are random and would most likely not be reproduced exactly during training (unless you use a seed, which sort of breaks the randomness).
Regarding the ColorJitter transform and generally all transforms acting on the luminance etc. of the images: they don’t yet support ranged bounds (unless you compile torchvision from master), so the randomness will always be centered along the mean value of the raw images (+/- parameter given).
For the RandomRotation and generally all transforms that could pad the images with black borders: try to replace the black border by the mean value of the raw images instead, so that it doesn’t interfere on the statistics.
I think the general idea here is the same as before: compute the statistics for whatever is going to be your use case, so that during evaluation/inference of the model, the test/validation images (which wouldn’t be augmented) have similar distribution!