If you want to create a batch containing data with different shapes, you could use a custom collate_fn
as described here.
However, deeplabv3_resnet101
is be a segmentation model, so your keypoint prediction might not work out of the box, but that’s just a side note and you might already have a plan how to use the model for your use case.