KEYPOINT Detection issues

Hi! I have a big issue in training a Keypoint R-CNN for detection of keypoints.
My dataset is composed of pictures with faces with boxes on eyes and as a keypoint, one per each box, the center of the eye.

the model definition is the one from the torchvision framework:
model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=2, num_keypoints=1, pretrained_backbone=True)

I am training the network with the coco method:

construct an optimizer

params = [p for p in model.parameters() if p.requires_grad]
optimizer = torch.optim.SGD(params, lr=0.0001,
momentum=0.9, weight_decay=0.0005)

and a learning rate scheduler which decreases the learning rate by

10x every 3 epochs

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,
gamma=0.1)

num_epochs = 7

for epoch in range(num_epochs):
# train for one epoch, printing every 10 iterations
train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=30)
lr_scheduler.step()

Now, this is the issue I got every time, I can’t even train for one epoch.
I have already checked that my data is correctly passed and the implementation of the data loader is like it is described in the documentation of torchvision.

The following image is showing the exception:

Does someone already encountered this problem? I thank you very much for helping me.

I have worked on keypoint detection before, perhaps you could try the following solution to ensure the number of keypoints are changed from 17 to the desired keypoints

model = torchvision.models.detection.keypointrcnn_resnet50_fpn(pretrained=False, progress=True, num_classes=2, pretrained_backbone=True)

model.roi_heads.keypoint_predictor = models.detection.keypoint_rcnn.KeypointRCNNPredictor(512, 2)

In my project I required the keypoint detected to be 26 as opposed to the default 17 so I changed the keypoint_predictor output in the roi_head directly.

Hope this helps

Thank you! I have already tried also this kind of solution but nothing changed so…I decided to write my own training procedure, for this task, based on MSELoss and used as a predictor a Resnext50 with the Linear layer that gives the desired output.

Ha alright, glad to hear you found a solution