Different networks accept input data with different size. In this regard, it is considered to use scale transformation in order to change image size to desirable size for pretrained model. But some scientists say it is a problem. In this regard, some methods are proposed for this issue like Feature Pyramid Networks.
this is a paper that is used this method.
We know that each network has two section: feature map extractor (convolution layers) and classification (fully connected layers). This method is used between two sections, which makes the network independent from input size. But it reduces speed very much as I remember.