What's the range of the input value desired to use pretrained resnet152 and vgg19?

I try to use pretrained resnet152 and vggnet19. In the case of resnet, there are batch normalization layers which are likely to invariant to input normalization (e.g. torchvision.transforms.Normalize). However, in the case of vggnet, I think that if the input range is different, it will cause different results.

What is the range of the input value when you trained the CNNs in torchvision?

Is it the range of [-1.0, 1.0] or [0, 1] ?

1 Like

input image is first loaded to range [0, 1] and then this normalization is applied to RGB image as described here:

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be atleast 224.

The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]

An example of such normalization can be found in the imagenet example here https://github.com/pytorch/examples/blob/42e5b996718797e45c46a25c55b031e6768f8440/imagenet/main.py#L89-L101

6 Likes

Wow, this is very nice information. Thanks!

If I am trying to train my own model from scratch (not vgg or resnet), do I still need to normalize the rgb pixels to [0,1]? Or just [0,255] like in torch

either is ok. by substracting proper mean values, these two methods can get similar performance…

why are you using these numbers for normalization [0.485, 0.456, 0.406]?

Did you find out ? I’m wondering the same :slight_smile:

It’s just a mean computed from a random subset of imagenet pictures. See here for a mean/std computation that was done in torch.

They just kept the magic numbers without keeping the tedious mean/std computation part when porting to pytorch.

2 Likes

hi
i am a beginner , my inputs are videos processed and stored in tensors of shape
[batch size, number of channels , number of frames , height , width]
in this case how can i resize the height and width and perform the required normalization to use resnet? , i saw a lot of tutorials but in all of them they are using images