Faster-RCNN input data in training mode

Hello there. Can anyone help me to understand docs link.
documentation says:

The input to the model is expected to be a list of tensors, each of shape [C, H, W], one for each
image, and should be in 0-1 range. Different images can have different sizes.
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values
between 0 and H and 0 and W
- labels (Int64Tensor[N]): the class label for each ground-truth box

Unfortunately, I still don’t understand the input format :smile: Based on documentation, class expect:

# list of tensor 
images = [tensor([C,H,W]), tensor([C,H,W]), tensor([C,H,W])]

# list of dictionary
targets = [{'boxes': tensor([1,4]), 'labels': tensor([1])}, 
 'boxes': tensor([1,4]), 'labels': tensor([1])},
 'boxes': tensor([1,4]), 'labels': tensor([1])}]

And if we have more bounding boxes for each images then boxes will be tensor([N_boxes, 4]) and label tensor([N]), I’m right?

And one more question about __getitem__ . I don’t know how to return a list of dictionary in this format…
I do

targets = [{'boxes': bbox, 'labels': label}]
return img, targets

and this is wrong, I suppose…
i.e. I have only one bounding box and label for each image and I get shapes (where 8 is batch size):

# shape of images
torch.Size([8, 3, 224, 224])
# shape of targets
[{'boxes': torch.Size([8, 1, 4]),
 'labels': torch.Size([8])}]

Thanks for your attention

1 Like

I solved problems

  1. Yes, class expect a list of dictionary
  2. __getitem__ is correct, the problem was in DataLoader and batches which he returned


def collate(batch):
    return tuple(zip(*batch))

dataloader = DataLoader(dataset, batch_size=5, collate_fn=collate)

And I get list of dictionary

1 Like

Thank you so much!. I had stuck in this part.