It looks like the images are not normalized when training maskrcnn_resnet50_fpn. See vision/presets.py at main · pytorch/vision · GitHub where
hflip is the default argument for the training recipe defined here vision/references/detection at main · pytorch/vision · GitHub
Is it intentional? Why is normalization not applied?
Edit: I also see it mentioned here
Note that these models don’t require the images to be normalized, so we don’t need to use the normalized batch
but still not sure why.