(In training loop) TypeError: string indices must be integers, not 'str'

I’m attempting to train maskrcnn_resnet50_fpn using a custom dataset consisting of grayscale images and binary masks. I’m creating a dataset as follows:

class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, root_dir):
        self.root_dir = root_dir
        self.image_files = os.listdir(os.path.join(root_dir, 'images'))

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        img_name = os.path.join(self.root_dir, 'images', self.image_files[idx])
        mask_name = os.path.join(self.root_dir, 'masks', self.image_files[idx])
        image = Image.open(img_name).convert('RGB').resize((1024,1024))
        mask = Image.open(mask_name).convert('L').resize((1024,1024))
        # Normalize pixel values to [0, 1]
        image = torch.tensor(np.array(image)) / 255.0
        mask = torch.tensor(np.array(mask)) / 255.0
        mask = mask.unsqueeze(0)
        # Apply horizontal flip with 50% probability
        if torch.rand(1) < 0.5:
            image = F.hflip(image)
            mask = F.hflip(mask)
        # Apply vertical flip with 50% probability
        if torch.rand(1) < 0.5:
            image = F.vflip(image)
            mask = F.vflip(mask)
        boxes = masks_to_boxes(mask)
        labels = torch.tensor([1])
        target = {
            'boxes': boxes,
            'labels': labels,
            'masks': mask
        }
        
        return image, target

When I run my training loop, the following code throws an error:

for images, targets in tqdm(train_data_loader, desc=f"Epoch {epoch + 1}/{num_epochs}"):
            optimizer.zero_grad()
            outputs = model(images, targets)
            ...

The error readout is:

File ~\AppData\Roaming\Python\Python311\site-packages\torchvision\models\detection\generalized_rcnn.py:65, in GeneralizedRCNN.forward(self, images, targets)
     63 else:
     64     for target in targets:
---> 65         boxes = target["boxes"]
     66         if isinstance(boxes, torch.Tensor):
     67             torch._assert(
     68                 len(boxes.shape) == 2 and boxes.shape[-1] == 4,
     69                 f"Expected target boxes to be a tensor of shape [N, 4], got {boxes.shape}.",
     70             )

TypeError: string indices must be integers, not 'str'

But target does seem to be organized properly as a dictionary. With a batch size of 8, if I print targets after the statement outputs = model(images, targets), I get:

{'boxes': tensor([[[ 77., 528., 159., 627.]],
        [[482., 439., 865., 654.]],
        [[173., 454., 485., 563.]],
        [[152., 409., 206., 449.]],
        [[352., 497., 399., 562.]],
        [[873., 490., 957., 547.]],
        [[564., 489., 731., 605.]],
        [[832., 238., 964., 363.]]]), ...

So I’m not sure why I’m seeing this error.

Based on the error message and the code I would assume targets is expected to be a list containing the target dicts. In your code it seems as if targets is the dict itself and fails with:

targets = {'boxes': torch.randn(1),
           'labels': torch.randn(1),
           'masks': torch.randn(1),
}

for target in targets:
    target["boxes"]
# TypeError: string indices must be integers

# this works
for target in [targets]:
    target["boxes"]

Could you check if you would need to pass targets as a list or another container?

Thank you! Replacing outputs = model(images, targets) with outputs = model(images, [targets]) eliminates that error but produces a new one: AssertionError: Expected target boxes to be a tensor of shape [N, 4], got torch.Size([8, 1, 4]). Seems like there should be a squeeze(1) statement somewhere. Omitting the mask = mask.unsqueeze(0) statment in getitem produces a ValueError: not enough values to unpack (expected 2, got 1) in the training loop.

I should note that I’m setting up the data loader as follows:

train_dataset = CustomDataset(trainpath)
train_data_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)

Assuming that a datapoint from your dataset is a dict

        target = {
            'boxes': boxes, # shape [1, 4]
            ...
        }

then after the collate_fn is seem that targets now have shape

        targets = {
            'boxes': boxes, # shape [8,1, 4]
            ...
        }

It’s because what the collate_fn do is to stack all tensor from same functionality in a batch to one, so instead of having a list of dictionary, you now have a dictionary with the stacked values. If you want to extract boxes of a batch, you can rewrite to this

    boxes = targets["boxes"]
        for box in boxes:
        if isinstance(box, torch.Tensor):
            torch._assert(
                len(box.shape) == 2 and box.shape[-1] == 4,
                f"Expected target boxes to be a tensor of shape [N, 4], got {box.shape}.",
            )

Thank you. This issue seems to involve Dataloader – I’m going to post a new question.