It looks like the images are not normalized when training maskrcnn_resnet50_fpn. See https://github.com/pytorch/vision/blob/main/references/detection/presets.py#L7 where hflip
is the default argument for the training recipe defined here https://github.com/pytorch/vision/tree/main/references/detection#mask-r-cnn
Is it intentional? Why is normalization not applied?
Edit: I also see it mentioned here
Note that these models don’t require the images to be normalized, so we don’t need to use the normalized batch
but still not sure why.