Output of resnet_fpn_backbone (pretrained)

Im trying to get an output from a pretrained resnet_fpn_backbone for processing in object detection. Im using the finetuning model from torchvision (fasterrcnn_resnet50_fpn) to test pretraining. Unfortunately i failed several times and i don’t know what to change.

I used the following code (left out imports):

#MAKE MODEL

class Identity(nn.Module):
    def __init__(self):
        super(Identity, self).__init__()
        
    def forward(self, x):
        return x

model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.rpn = Identity()
model.roi_heads = Identity()

#MAKE INPUT-TEST-DATA

images = torch.rand(2, 3, 600, 1200)

boxes = np.array([[[0.29, 0.10, 0.43, 0.58],[0.24, 0.17, 0.33, 0.48]],[[0.04, 0.10, 0.63, 0.58],[0.29, 0.16, 0.43, 0.78]]])
boxes = boxes.astype(np.float32)
boxes = torch.from_numpy(boxes)
labels = torch.randint(1, 91, (2, 2))
images = list(image for image in images)
targets = []
for i in range(len(images)):
    d = {}
    d['boxes'] = boxes[i]
    d['labels'] = labels[i]
    targets.append(d)

#TEST MODEL (in model.train())

output = model(images, targets)     #I fail here!
print(output)

This is the error i receive:

I also tried using this instead:

model = torchvision.models.detection.backbone_utils.resnet_fpn_backbone('resnet50',pretrained=True)
output = model(images, targets)

But it throws following error: