What's the range of the input value desired to use pretrained resnet152 and vgg19?

yunjey · April 6, 2017, 4:16am

I try to use pretrained resnet152 and vggnet19. In the case of resnet, there are batch normalization layers which are likely to invariant to input normalization (e.g. torchvision.transforms.Normalize). However, in the case of vggnet, I think that if the input range is different, it will cause different results.

What is the range of the input value when you trained the CNNs in torchvision?

Is it the range of [-1.0, 1.0] or [0, 1] ?

smth · April 6, 2017, 4:33am

input image is first loaded to range [0, 1] and then this normalization is applied to RGB image as described here:

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be atleast 224.

The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]

An example of such normalization can be found in the imagenet example here https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101

yunjey · April 6, 2017, 4:36am

Wow, this is very nice information. Thanks!

brisker · April 6, 2017, 7:35am

If I am trying to train my own model from scratch (not vgg or resnet), do I still need to normalize the rgb pixels to [0,1]? Or just [0,255] like in torch

shicai · April 6, 2017, 8:45am

either is ok. by substracting proper mean values, these two methods can get similar performance…

unnir · April 9, 2018, 9:50am

why are you using these numbers for normalization [0.485, 0.456, 0.406]?

cyberjoac · May 4, 2018, 12:34pm

Did you find out ? I’m wondering the same

ClementPinard · May 4, 2018, 12:41pm

It’s just a mean computed from a random subset of imagenet pictures. See here for a mean/std computation that was done in torch.

github.com

soumith/imagenet-multiGPU.torch/blob/master/donkey.lua#L154


   split = 0,
   verbose = true,
   forceClasses = trainLoader.classes -- force consistent class indices between trainLoader and testLoader
}
torch.save(testCache, testLoader)
testLoader.sampleHookTest = testHook
end
collectgarbage()
-- End of test loader section


-- Estimate the per-channel mean/std (so that the loaders can normalize appropriately)
if paths.filep(meanstdCache) then
local meanstd = torch.load(meanstdCache)
mean = meanstd.mean
std = meanstd.std
print('Loaded mean and std from cache.')
else
local tm = torch.Timer()
local nSamples = 10000
print('Estimating the mean (per-channel, shared for all pixels) over ' .. nSamples .. ' randomly sampled training images')
local meanEstimate = {0,0,0}

They just kept the magic numbers without keeping the tedious mean/std computation part when porting to pytorch.

Sara176 · January 20, 2023, 1:02pm

hi
i am a beginner , my inputs are videos processed and stored in tensors of shape
[batch size, number of channels , number of frames , height , width]
in this case how can i resize the height and width and perform the required normalization to use resnet? , i saw a lot of tutorials but in all of them they are using images