Why does PyTorch Object Detection model NOT learn anything?

I have an object detection task with one class of objects (plus the background). I am training a Faster RCNN model and a Retinanet model for the task. The loss during training does not decrease at all. At the end of training, the Faster RCNN predicts about 100 bounding boxes, all in the top left corner and overlapping with each other. The Retinanet model produces empty bounding boxes.

Given the non-decreasing loss at the time of training, it is expected that the models will not perform well. But I cannot see a reason why the models do not learn anything at all. Here are relevant parts of the code:
Models:

model_retinanet = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=False, pretrained_backbone=True, num_classes=2)
model_retinanet.transform = torchvision.models.detection.transform.GeneralizedRCNNTransform(min_size=3040,
                                                                                  max_size=4048,
                                                                                  image_mean=[0.485, 0.456, 0.406],
                                                                                  image_std=[0.229, 0.224, 0.225])

model_fasterrcnn = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
in_features = model_fasterrcnn.roi_heads.box_predictor.cls_score.in_features
model_fasterrcnn.roi_heads.box_predictor = FastRCNNPredictor(in_features, 2)
model_fasterrcnn.transform = torchvision.models.detection.transform.GeneralizedRCNNTransform(min_size=3040,
                                                                                  max_size=4048,
                                                                                  image_mean=[0.485, 0.456, 0.406],
                                                                                  image_std=[0.229, 0.224, 0.225])

Training:

for epoch in range(num_epochs):
        train_one_epoch(model, optimizer, data_loader_train, torch.device('cpu'), epoch, print_freq=10)

Here model is either model_retinanet or model_fasterrcnn. The function train_one_epoch is the one used in the PyTorch example on fine tuning object detection model.

PS: I have tried the same learning exercise on MATLAB, and it seems to work. However, MATLAB does not offer me as much flexibility as PyTorch, so I am trying to get the model to learn in PyTorch.