I’m trying to use the pre-trained Faster RCNN in PyTorch. I found that the
torchvision package has the Faster R-CNN ResNet-50 FPN pre-trained network. Seeing that it uses ResNet as its feature extractor, I assumed the preprocessing was the same as if I used the ResNet models in
torchvision: which is to use the ImageNet preprocessing below
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
However, I’m getting very poor predictions from the ResNet, very few cars are being detected and many of the obvious elements in the scene are not captured. What is the correct preprocessing for the Faster RCNN?
Detection models will normalize the images internally as seen in this line of code.
You could pass your own calculated stats or just use the default ImageNet stats.
Could you remove your normalization and rerun the code?
Removing the normalization did the trick, more cars and traffic lights are being detected now. Thanks!
Hi. Were you able to get the correct results out of the model? Because as it seems that the targets are always modified with simple forward pass. Check this . If possible can you tell me am I doing something wrong or is it a bug? You can comment on that post. Please excuse me for any mistakes. I am new to this.
@chhaya_kumar_das I didn’t do any training, only evaluation so I didn’t explore how the targets are being resized. Playing around with the model on colab suggests that it might be bug because my targets are also increasing rapidly in size (instead of decreasing). I used random values though so it might be the bogus data throwing off the training. I can’t be sure.
I have found a workaround. Check this. See if it works