I try to use pretrained resnet152 and vggnet19. In the case of resnet, there are batch normalization layers which are likely to invariant to input normalization (e.g. torchvision.transforms.Normalize). However, in the case of vggnet, I think that if the input range is different, it will cause different results.
What is the range of the input value when you trained the CNNs in torchvision?
input image is first loaded to range [0, 1] and then this normalization is applied to RGB image as described here:
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be atleast 224.
The images have to be loaded in to a range of [0, 1] and then normalized using mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225]
If I am trying to train my own model from scratch (not vgg or resnet), do I still need to normalize the rgb pixels to [0,1]? Or just [0,255] like in torch
hi
i am a beginner , my inputs are videos processed and stored in tensors of shape
[batch size, number of channels , number of frames , height , width]
in this case how can i resize the height and width and perform the required normalization to use resnet? , i saw a lot of tutorials but in all of them they are using images