For Faster/Mask RCNN, before training the model, min and max size of the input images is fixed, which is determined to a great extent by the CUDA memory capacity. This is OK.
What I found at inference time, is that accuracy of the model strongly depends on the size of the input image. By default, min_size=800, max_size=1333
, but as I varied both hyperparameters, I got either better of worse results. I think the best results are obtained when the min_size
and max_size
are close to the input image’s size, but I’m not sure.
So this is my question: is there a confident way to find optimal input image size?