Quick conceptual question that I need to understand.
I noticed that when training the
torchvision.models.detection.ssdlite320_mobilenet_v3_large() and the
torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn() model, you don’t need to explicitly resize images into a specific value for the model to train. In my dataset, I have images of many different arbritary size that I did not resize and the model trained well. Therefore, I was wondering what resizing steps do the models do internally if they do any.
I noticed in the FasterRCNN that there is this block when loading the model:
Resize(min_size=(320,), max_size=640, mode='bilinear') and the torchvision documentation says the ssdlite takes images that 320 in inference. This is my progress solving the question, but I am stuck as Im not sure what speicifc size my model is resizing images in my dataset to be before they enter the object detection network.
Hopefully you can help with this,