Transforms.compose imposes on batch of images

Hello everyone!
I am new in PyTorch and I tried to implement randomly scaling the input images (e.g. from 0.5 to 1.5) during training which is mentioned in the Deeplab paper. Here is the code.

class RandomScale(object):
    def __init__(self, limit):
        self.limit = limit

    def __call__(self, sample):
        img = sample['image']
        mask = sample['label']
        assert img.size == mask.size

        scale = random.uniform(self.limit[0], self.limit[1])
        w = int(scale * img.size[0])
        h = int(scale * img.size[1])

        img, mask = img.resize((w, h), Image.BILINEAR), mask.resize((w, h), Image.NEAREST)

        return {'image': img, 'label': mask}

# Augmentation
composed_transforms_tr = transforms.Compose([
        RandomScale((0.5, 1.5)),
        RandomHorizontalFlip(),
        ToTensor()])

However when just started to train there raised an error:

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 665 and 420 in dimension 2 at c:\new-builder_3\win-wheel\pytorch\aten\src\th\generic/THTensorMath.cpp:3616

This is because images in the same batch need to be the same size. 665 and 420 in the error above are heights of input images after randomly scaling.

So I wonder if there is a way to transform batch of images simultaneously. Or can someone help to provide a suggestion to implement randomly scaling inputs?

Thank you very much!

You could sample the random crop size once for the next batch and resample in each iteration of your DataLoader.
Maybe not the most elegant approach, but should do the job.

Here is a small dummy example:

class MyDataset(Dataset):
    def __init__(self, limit):
        self.data = [TF.to_pil_image(x) for x in torch.randn(64, 3, 24, 24)]
        self.limit = limit
        self.resample_size(0)

    def resample_size(self, index):
        size = torch.empty(2).uniform_(self.limit[0], self.limit[1]).long().tolist()
        self.crop_coord = transforms.RandomCrop.get_params(self.data[index], size)
    
    def __getitem__(self, index):
        i, j, h, w = self.crop_coord
        img = self.data[index]
        img = TF.crop(img, i, j, h, w)
        x = TF.to_tensor(img)
        return x
    
    def __len__(self):
        return len(self.data)


dataset = MyDataset(limit=[18, 22])
loader = DataLoader(
    dataset,
    batch_size=4
)


for data in loader:
    loader.dataset.resample_size(0)
    print(data.shape)
2 Likes

I cannot thank you enough for helping me.

1 Like