Pascal VOC Detection input shape

I’m trying to use the torchvision pascal voc package

from torchvision.datasets import VOCDetection

My problem is that I’m having issues when resizing the images with transforms.

data_transforms = transforms.Compose([
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

dataset = VOCDetection(root="./", download=True, transform=data_transforms)
train_size = int(len(dataset) * 0.9)
val_size = int(len(dataset) - train_size)
train, val = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train, batch_size=32, num_workers=4)
val_loader = DataLoader(val, batch_size=32, num_workers=4)

I’m passing the size as a tuple, still I’m getting this when I run:

RuntimeError: each element in list of batch should be of equal size

If I pass something like transforms.Resize(size=(256)) I get a reasonble error

RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 341] at entry 0 and [3, 341, 256] at entry 1

which makes sense based on transforms.resize documentation. I’m confused because even passing it as a tuple, still I get the error.

I never worked with this detection datasets , so if there’s anything I should know I appreciate the help.

is it right to use batch sizes larger than 1? or is just standard like segmentation to use 1?

I would guess that the error is raised while trying to stack the target images as you are only resizing the input images.
Use target_transform to transform the target images too or use transforms and pass transformations to the dataset, which accept an image and target.

1 Like