Torchvision fasterrcnn and SSD mobilenet resizing question

SarthakJain · August 4, 2021, 8:25pm

Hi,

Quick conceptual question that I need to understand.
I noticed that when training the torchvision.models.detection.ssdlite320_mobilenet_v3_large() and the torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn() model, you don’t need to explicitly resize images into a specific value for the model to train. In my dataset, I have images of many different arbritary size that I did not resize and the model trained well. Therefore, I was wondering what resizing steps do the models do internally if they do any.
I noticed in the FasterRCNN that there is this block when loading the model:
Resize(min_size=(320,), max_size=640, mode='bilinear') and the torchvision documentation says the ssdlite takes images that 320 in inference. This is my progress solving the question, but I am stuck as Im not sure what speicifc size my model is resizing images in my dataset to be before they enter the object detection network.

Hopefully you can help with this,
Thanks,
Sarthak Jain

Dipam_Vasani · January 30, 2022, 7:55pm

You can always step into the code on Github and check for yourself. For example, the first component in Fasterrcnn is a transform object which transforms the images and targets. If you step into that, you will see the resize logic