Training Pre-Trained Retinanet for Different Image Sizes

I have a dataset of images that are of size 1000 x 1000 x 3 and I am interested in detecting only one class of images. The default input size of Resnet50-based Retinanet is much smaller than this. What changes should be done to the pre-trained model to adapt to the large size of images? I have tried making the following changes:

model = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=False, pretrained_backbone=True, num_classes=2)
model.transform = torchvision.models.detection.transform.GeneralizedRCNNTransform(min_size=1000, max_size=1000, image_mean=[0.485, 0.456, 0.406], image_std=[0.229, 0.224, 0.225])

But the model provides empty bounding boxes before and after training.

Update: I have tried different combinations and found that setting pre_trained=False or num_classes=2 in the model results in empty bounding boxes in the output. Can anyone clarify why this is happening and what can be done to correct this?