Resizing with single value after padsquare(from augly) gives RuntimeError: stack expects each tensor to be equal size

I am facing a bit of a bizzare problem: I have a bunch of different sized images and I am trying to train+infer on these, and I have the following example transform code:

        self.infer_transform = transforms.Compose([
            imaugs.PadSquare(p=1),
            transforms.Resize([384], interpolation=torchvision.transforms.InterpolationMode.BICUBIC),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

When I use a batchsize >1 I get thrown this:

RuntimeError: stack expects each tensor to be equal size, but got [3, 384, 384] at entry 0 and [3, 385, 384] at entry 3

I find this really bizarre, since, after PadSquare, when I resize using a single int, it should give me a square image back - but it seems like it does not… why is this? is this a bug? It almost seems like some round-off error (got [3, 384, 384] at entry 0 and [3, 385, 384]).

Hoever, if I do this:

        self.infer_transform = transforms.Compose([
            imaugs.PadSquare(p=1),
            transforms.Resize((384,384), interpolation=torchvision.transforms.InterpolationMode.BICUBIC),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

it works fine…

What is the reason behind this? I am perplexed! When I try out sample images in say colab, they seem to have the same size…

No, it’s not a bug and is expected behavior for non-square input images.
From the docs:

size (sequence or int) –
Desired output size. If size is a sequence like (h, w), output size will be matched to this. If size is an int, smaller edge of the image will be matched to this number. i.e, if height > width, then image will be rescaled to (size * height / width, size).

Using the tuple is thus the right approach for non-square inputs.

@ptrblck pardon my ignorance, but because I padSquare first, which gives me a square image, the input to the next compose (Resize) is square - and therefore an int should be fine - is it not the case? :frowning:

Yes, you are right. If you are padding the image so that it has the same height and width, a single value in Resize should work.
Could you add this my_print transformation to check which shapes are used?

def my_print(img):
    print(np.array(img).shape)
    return img

transform = transforms.Compose([
    transforms.Lambda(lambda x: my_print(x)),
    imaugs.PadSquare(p=1),
    transforms.Lambda(lambda x: my_print(x)),
    transforms.Resize([384], interpolation=transforms.InterpolationMode.BICUBIC),
    transforms.Lambda(lambda x: my_print(x)),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

@ptrblck thank you for this my_print func - I will try this as soon as the gpus become free. What I see is that for some reason, Ihave an extra pixel across, so I seem to get [3.385,385] instead of [3,384,385]. Currently, I just centercrop it to 384 and resolve the issue, but yes something does not add up!

Will get back with results.

Thank you again. Your help is invaluable.