Hi all,
I was wondering, when using the pretrained networks of torchvision.models module, what preprocessing should be done on the input images we give them ?
For instance I remember that if you use VGG 19 layers you should substract the following means [103.939, 116.779, 123.68].
Where can I find these numbers (and even better with std infos) for alexnet, resnet and squeezenet ?
Agreed. If it wasn’t for this thread, I would have missed this important Normalization step for sure. It would be nice if it could be added to the documentation.
This is pretty key information. Without doing this, and only doing mean centering and stddev normalization of the original Hunsfield units, I need to keep batch normalization enabled during test to see reasonable results from my volumetric segmentation network.
Maybe you can remove this normalization to see if you are still having these inf values.
If yes, this means the source of error is somewhere else than Normalize.
If not, could you try a dummy normalization with normalize = torchvision.transforms.Normalize(mean=(1, 1, 1), std=(1, 1, 1) )
?
Using the mean and std on ImageNet is pretty standard practice. Since the mean and std are calculated using a million of images, the statistics is pretty stable. Also the pretrained model is trained using the mean and std in ImageNet. I do not recommend changing the mean and std to that on your small dataset.
Thank you for your comment. I’ve been having that doubt a while ago. So basically what I infer from the comments as a summary is that the best practice is to leverage:
Stability of our images (mean/std)
Similarity of our image dataset to ImageNet
in our specific task and dataset to check what’s the best option to normalise.
@smth I have confusion regarding the use of same mean and std for all data. Why can’t we use per image in that way each image is independent for nomalization?