Image data augmentation in pytorch

cvprogrammer · September 14, 2023, 11:03am

Hello Everyone,

How does data augmentation work on images in pytorch? i,e How does it work internally? For example. If my dataset has 8 images and i compose a transform as below

transforms.Compose([
transforms.ToPILImage(),
transforms.Resize((128,128)),
transforms.RandomVerticalFlip(1),
transforms.RandomHorizontalFlip(1),
transforms.ColorJitter(brightness=(0.5,1.5),contrast=(1),saturation=(0.5,1.5),hue(-0.1,0.1))
transforms.ToTensor()
])

Will the pytorch dataloader have access to 8*4=32 number of images? If so, How to display(or save) these 32 images? Can we access these 32 images?

How to apply augmentation to image segmentation dataset? In segmentation, we use both image and mask. In some cases we dont want to apply augmentation to mask(eg. transforms.ColorJitter). If we pass both image and mask simultaneously to the pytorch augmentation function then augmentation will be applied to both image and mask. If we apply separately, then in case of random augmentations like transforms.RandomVerticalFlip(.5), the random augmentation may be applied to image and may not be applied to the mask. How to deal with this kind of situation?

Thank you.

ptrblck · September 14, 2023, 2:27pm

The DataLoader doesn’t have access to the transformation in the standard use case as the samples are loaded and transformed in the Dataset separately.
Here is a small example:

transform = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize((128,128)),
    transforms.RandomVerticalFlip(1),
    transforms.RandomHorizontalFlip(1),
    transforms.ColorJitter(brightness=(0.5,1.5),contrast=(1),saturation=(0.5,1.5),hue=(-0.1,0.1)),
    transforms.ToTensor(),
])

x = torch.randn(3, 224, 224)
out = transform(x)

You can either use the functional API as described here, torchvision.transforms.v2 which allows to pass multiple objects as described here, or any other library mentioned in the first link.