As you can see,the second element of the return is bboxes,and different image will have different number of objects in it,so the shape of the bboxes will vary from each other,and this will cause Dataloder throws out an exception like:
RuntimeError: stack expects each tensor to be equal size, but got [1, 1, 5] at entry 0 and [1, 5, 5] at entry 1
when it tries to stack bboxes(label) in a batch,question is
Usually, in such situations some sort of padding has to be introduced for each batch so all elements match in size. This can be achieved by defining your own collate_fn that is then passed to DataLoader as an argument.
Thanks,I may haven’t express may question clearly,the size(width/height) of every transformed image is same,and the problem caused from that the numbers of object in each image in the same batch are different,not the height or width of images
By default, Dataloader tries to stack the tensors to form a batch (calls torch.stack on the current batch), but it fails if the tensors are not of equal size. With the collate_fn it is possible to override this behavior and define your own “stacking procedure”. In the example above, the batch arg contains a list of instances (an image-bbox pair; the batch arg is of type List[Tuple[Image, Bbox]]) and with tuple(zip(*batch)) we form a batch, where batch[0] corresponds to the images, and batch[1] to the bboxes in the batch.