I am facing a bit of a bizzare problem: I have a bunch of different sized images and I am trying to train+infer on these, and I have the following example transform code:
self.infer_transform = transforms.Compose([
imaugs.PadSquare(p=1),
transforms.Resize([384], interpolation=torchvision.transforms.InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
When I use a batchsize >1 I get thrown this:
RuntimeError: stack expects each tensor to be equal size, but got [3, 384, 384] at entry 0 and [3, 385, 384] at entry 3
I find this really bizarre, since, after PadSquare, when I resize using a single int, it should give me a square image back - but it seems like it does not… why is this? is this a bug? It almost seems like some round-off error (got [3, 384, 384] at entry 0 and [3, 385, 384]).
Hoever, if I do this:
self.infer_transform = transforms.Compose([
imaugs.PadSquare(p=1),
transforms.Resize((384,384), interpolation=torchvision.transforms.InterpolationMode.BICUBIC),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])
it works fine…
What is the reason behind this? I am perplexed! When I try out sample images in say colab, they seem to have the same size…