Pascal VOC Detection input shape

Gabriel_Dornelles · January 13, 2022, 12:53am

I’m trying to use the torchvision pascal voc package

from torchvision.datasets import VOCDetection

My problem is that I’m having issues when resizing the images with transforms.

data_transforms = transforms.Compose([
        transforms.Resize(size=(256,300)),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

dataset = VOCDetection(root="./", download=True, transform=data_transforms)
train_size = int(len(dataset) * 0.9)
val_size = int(len(dataset) - train_size)
train, val = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train, batch_size=32, num_workers=4)
val_loader = DataLoader(val, batch_size=32, num_workers=4)

I’m passing the size as a tuple, still I’m getting this when I run:

RuntimeError: each element in list of batch should be of equal size

If I pass something like transforms.Resize(size=(256)) I get a reasonble error

RuntimeError: stack expects each tensor to be equal size, but got [3, 256, 341] at entry 0 and [3, 341, 256] at entry 1

which makes sense based on transforms.resize documentation. I’m confused because even passing it as a tuple, still I get the error.

I never worked with this detection datasets , so if there’s anything I should know I appreciate the help.

Gabriel_Dornelles · January 13, 2022, 2:38am

is it right to use batch sizes larger than 1? or is just standard like segmentation to use 1?

ptrblck · January 13, 2022, 6:52am

I would guess that the error is raised while trying to stack the target images as you are only resizing the input images.
Use target_transform to transform the target images too or use transforms and pass transformations to the dataset, which accept an image and target.