How to preprocess input for pre trained networks?

Hi all,
I was wondering, when using the pretrained networks of torchvision.models module, what preprocessing should be done on the input images we give them ?
For instance I remember that if you use VGG 19 layers you should substract the following means [103.939, 116.779, 123.68].
Where can I find these numbers (and even better with std infos) for alexnet, resnet and squeezenet ?

Thank you very much

5 Likes

All pretrained torchvision models have the same preprocessing, which is to normalize using the following mean/std values: https://github.com/pytorch/examples/blob/97304e232807082c2e7b54c597615dc0ad8f6173/imagenet/main.py#L197-L198 (input is RGB format)

6 Likes

Thank you very much!

Hi, it looks like the pixel intensities have been rescaled to [0 1] before normalization. It that right?

@qianguih yes they have to be RGB normalized to [0, 1] before further applying the normalization that I pointed out.

I see. Thank you very much!

This is important information, I wonder it’s not put in the doc but in the example code?

Agreed. If it wasn’t for this thread, I would have missed this important Normalization step for sure. It would be nice if it could be added to the documentation.

1 Like

This is pretty key information. Without doing this, and only doing mean centering and stddev normalization of the original Hunsfield units, I need to keep batch normalization enabled during test to see reasonable results from my volumetric segmentation network.

This should really be in bold somewhere.

2 Likes

can we put the mean and std inside the resnet model?just need to register a buffer.

When I normalize the data with mean=(0,485, 0,456, 0,406) and std=(0,229, 0,224, 0,225) (Image Net) the values of the channels 0 and 2 are transformed to inf, is that right ?

Of course not.
Obviously, you have ‘division by zero’ somewhere. Try to debug the code to figure out the source of error.

I have read my code many times and I can’t figure out where’s the error, If I change the mean and the std everything works fine, all channels maintains its values (0, 1, 2). I have read about throw away the (Image Net) normalization and keep the weights, what do you thing ?

Shouldn’t the mean and std be tuples?

normalize = torchvision.transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225) )

Or, it is OK to use list sequences as shown in your case?

I have tested both and as you can see the difference between list and tuple it’s the immutability, I don’t know how it can be causing this behavior

Maybe you can remove this normalization to see if you are still having these inf values.
If yes, this means the source of error is somewhere else than Normalize.
If not, could you try a dummy normalization with
normalize = torchvision.transforms.Normalize(mean=(1, 1, 1), std=(1, 1, 1) )
?

If I use the pretrained model on ImageNet and fine-tune it on my own dataset, should I re-calculate the mean and std with my own dataset?

Using the mean and std on ImageNet is pretty standard practice. Since the mean and std are calculated using a million of images, the statistics is pretty stable. Also the pretrained model is trained using the mean and std in ImageNet. I do not recommend changing the mean and std to that on your small dataset.

Thank you for your comment. I’ve been having that doubt a while ago. So basically what I infer from the comments as a summary is that the best practice is to leverage:

Stability of our images (mean/std)
Similarity of our image dataset to ImageNet

in our specific task and dataset to check what’s the best option to normalise.

Actually, this link points to correct lines: