Pytorch dataloader: data shape issue with custom COCO dataset for model finetuning

I have an object detection task for which I prepared images and annotations*. The objective to is fine-tune an existing model with Pytorch. Images (PNGs) are stored in the same folder where the COCO json annotations are stored. The json annotations use the Object Detection COCO format:


    "info": {...},
     "images": [
            {
                "id": 1,
                "width": 4961,
                "height": 3508,
                "file_name": "somefile.png",
                "license": 1,
                "coco_url": null,
                "date_captured": null
            },
            ...
     ],
     "annotations": [
            {
                "id": 1,
                "image_id": 1,
                "category_id": 3,
                "segmentation": [[4372,2830], [4803,2830],[4803,3131],[4372,3131]],
                "area": 129731,
                "bbox": [4587.5,2980.5,431,301],
                "iscrowd": 0
            }, ...
    ],
    "categories": [
            {
                "id": 1,
                "name": "someCategoryName",
                "supercategory": "someSuperCategoryName"
            },
            ...
    ],
    "licenses": [...]

I need to read this dataset to fine-tune an existing model. This is how I’m opening it:

# By default, define a transform to convert PIL image to a Torch tensor
transform = transforms.Compose([transforms.PILToTensor()])
train_dataset = dset.CocoDetection(root=dataset_path, annFile=annotationFilePath, transform=transform)

I’m then defining a Data Loader as follows. The dataloader next() function works fine:

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=4, collate_fn=lambda batch: tuple(zip(*batch)))
images, targets = next(iter(train_loader)) # check that `next()` works

The content of images and targets objects is like this:

images: [tensor([[[255, 255, ...rch.uint8), tensor([[[255, 255, ...rch.uint8)]
targets: [{'id': 265, 'image_id': 265, 'category_id': 2, 'segmentation': [...], 'area': 1726288, 'bbox': [...], 'iscrowd': 0}]

In particular, note that targets is a list of dictionaries, where each dictionary retains the shape of the annotations entry in the json (see the code snippet).

Finally, I test if the model function can actually run, with:

model = models.detection.fasterrcnn_mobilenet_v3_large_fpn(weights=torchvision.models.detection.FasterRCNN_MobileNet_V3_Large_FPN_Weights.COCO_V1)
output = model(images, targets)  # check if `model()` works 

However, this always raises:

TypeError: list indices must be integers or slices, not str

This error comes from the forward() method of generalized_rcnn.py, because it expects a “boxes” key entry when iterating the targets.

Now, I would have expected the DataLoader to appropriately shape the targets for them to work with the forward() method, given that I’m using a COCO dataset. Why isn’t that so? What should I do to make it work? I cannot seem to find anything in the documentation.

* For completeness: to label, my company needs to use a specific PDF annotator. Therefore, I wrote my own converter from that output to get a COCO-shaped dataset. There may be something incorrect in the COCO json annotations I obtained, but I personally do not think this error is related to that.