FASTER_RCNN image size and weird transform behaviour

Hi, I have been trying to train and test faster_rcnn where I want to train and test pytorch image with the following parameter

model =torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True, min_size = 200,max_size = 4000)

but it seems image is resized upon inference eventhough it shouldn’t be inference image sizes are : (912,912), (1824,1824), (3648,3648)

I know that image is resized upon inference since the time and GPU RAM usage doesn’t make sense (all size are similar even if I set the min_size = 200, and max_size = 4000, but if I change min_size and max_size close to the original size like (3648, 3648) the time and RAM usage will be different which is why I draw conclusion that the image is resized)

problem is, because I felt like the generalized transform is so whimsical I don’t even know if my train is the right size/ model learn the right information.

Although now that I think about it I didn’t use dataloader while inferencing (straight using util_dataset) does it affect the inference image size in any way?

or maybe someone can help me avoid this issue by completely avoiding automatic transform altogether? I really feel like automatic transform is out of scope for Faster-RCNN responsibility and this closely coupled structure is really making it hard to use. it’s coupled in both Generalized-RCNN and automatically in Faster-RCNN too, I have tried manually decoupling the transform process but it feels like rabbit hole since other process depends on transformation result

logically when i set min_size, max_size (200,4000) it should accept image between those sizes right? but the thing is when i said min_size, max_size (200,4000) say I detect 2 Positive objects but when i set min_size, max_size (3648,3648) the results changed a lot, but by logic it shouldn’t right?