Maximum image size Faster RCNN


I am wondering what is the maximum image size I can use with a pretrained torchvision fasterrcnn_resnet50_fpn. At the moment I am working with images of shape (3, 1300, 1300). I can’t afford the image to be downsampled as I would lose critical resolution, but if needed I can generate smaller images (they are artificial data). I found the max_size argument in the FasterRCNN function definition with default value set to 1333; and this argument is probably passed to GeneralizedRCNNTransform at some point. So I’m assuming there is no downsampling taking place up to that value (1333), but I would like to be sure.



Have you had any progress on this?

I too am in a similar situation with FasterRCNN… except I am using ResNet 101… my prior implementation I could fill the GPU with batch_size = 1 … where as with this torchvision version I cannot seem to get above 40% VRAM utilization… which to me indicates down sampling

Is the Integer max_size input of class FasterRCNN(GeneralizedRCNN): changing anything for you in terms of VRAM usage ? I see no change no matter what number I place in there.

my images are 4K, 3840 x 2160, and up… so they definitely would have trouble before fitting on my GPU which has only 6GBs of VRAM… now I am puzzled how I can maximize the resolution and avoid at all costs down sampling

If you look at the docs you will see that it does resize the input. If you look further in the source code, you can see exactly how the resizing works. But in short, if you don’t want it to resize, just set the min and max to the shape of your input