Training Instance Segmentation model on torchvision.datasets.Cityscapes

I want to use the code in the Object Detection Finetuning Tutorial to train and test a Mask-RCNN model on torchvision.datasets.Cityscapes.

I am having trouble with the train_one_epoch function, similarly to what is happening in this issue 'train_one_epoch' gives error while using COCO annotations · Issue #1699 · pytorch/vision · GitHub. The solution is that targets should be a list of dictionaries, but I don’t know how to go from the type of target that we have in the Cityscapes class (PIL image) to a list of dictionaries like that.

Is there an easier way to train a model on Cityscapes without having to do this?
It seems that the targets are expected in the “COCO format”, so this repository might be useful, which seems to convert Cityscapes to COCO.

