I have an object detection task for which I prepared images and annotations*. The objective to is fine-tune an existing model with Pytorch. Images (PNGs) are stored in the same folder where the COCO json annotations are stored. The json annotations use the Object Detection COCO format:
"info": {...},
"images": [
{
"id": 1,
"width": 4961,
"height": 3508,
"file_name": "somefile.png",
"license": 1,
"coco_url": null,
"date_captured": null
},
...
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 3,
"segmentation": [[4372,2830], [4803,2830],[4803,3131],[4372,3131]],
"area": 129731,
"bbox": [4587.5,2980.5,431,301],
"iscrowd": 0
}, ...
],
"categories": [
{
"id": 1,
"name": "someCategoryName",
"supercategory": "someSuperCategoryName"
},
...
],
"licenses": [...]
I need to read this dataset to fine-tune an existing model. This is how I’m opening it:
# By default, define a transform to convert PIL image to a Torch tensor
transform = transforms.Compose([transforms.PILToTensor()])
train_dataset = dset.CocoDetection(root=dataset_path, annFile=annotationFilePath, transform=transform)
I’m then defining a Data Loader as follows. The dataloader next()
function works fine:
train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=4, collate_fn=lambda batch: tuple(zip(*batch)))
images, targets = next(iter(train_loader)) # check that `next()` works
The content of images
and targets
objects is like this:
images: [tensor([[[255, 255, ...rch.uint8), tensor([[[255, 255, ...rch.uint8)]
targets: [{'id': 265, 'image_id': 265, 'category_id': 2, 'segmentation': [...], 'area': 1726288, 'bbox': [...], 'iscrowd': 0}]
In particular, note that targets
is a list of dictionaries, where each dictionary retains the shape of the annotations
entry in the json (see the code snippet).
Finally, I test if the model
function can actually run, with:
model = models.detection.fasterrcnn_mobilenet_v3_large_fpn(weights=torchvision.models.detection.FasterRCNN_MobileNet_V3_Large_FPN_Weights.COCO_V1)
output = model(images, targets) # check if `model()` works
However, this always raises:
TypeError: list indices must be integers or slices, not str
This error comes from the forward()
method of generalized_rcnn.py
, because it expects a “boxes” key entry when iterating the targets
.
Now, I would have expected the DataLoader
to appropriately shape the targets
for them to work with the forward()
method, given that I’m using a COCO dataset. Why isn’t that so? What should I do to make it work? I cannot seem to find anything in the documentation.
* For completeness: to label, my company needs to use a specific PDF annotator. Therefore, I wrote my own converter from that output to get a COCO-shaped dataset. There may be something incorrect in the COCO json annotations I obtained, but I personally do not think this error is related to that.