Increase dataset size using Data Augmentation

Lucky_Magna · April 20, 2021, 7:21pm

Is there any way to increase dataset size using image augmentation in pytorch, like making copies of same images with variations like cropping or other techniques that are available in torchvision transforms. I used the code mentioned below, but I want to oversample the dataset and check how that affects the models performance.

transform = {

'train':

transforms.Compose([

    transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)),

    transforms.RandomRotation(degrees=15),

    transforms.ColorJitter(),

    transforms.RandomHorizontalFlip(),

    transforms.CenterCrop(size=224),  # Image net standards

    transforms.ToTensor(),

    transforms.Normalize([0.485, 0.456, 0.406],

                         [0.229, 0.224, 0.225])  # Imagenet standards

]),

'val':

transforms.Compose([

    transforms.Resize(size=256),

    transforms.CenterCrop(size=224),

    transforms.ToTensor(),

    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

]),

'test':

transforms.Compose([

    transforms.Resize(size=256),

    transforms.CenterCrop(size=224),

    transforms.ToTensor(),

    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

]),

}

Thankyou

cpeters · April 20, 2021, 7:36pm

You mean sample more than the entire dataset per epoch? That’s functionally identical to simply increasing the number of epochs you have when you’re using random transformations.

Lucky_Magna · April 20, 2021, 7:49pm

Well, actually yes, but not during each epoch instead I want to prepare a secondary dataloader with images that are double the size of my original dataloader and use them for training in each epoch. But your answer made me think of using different set of transforms in each epoch on the original dataset. Can I do that? This my be a silly doubt but please help me.

cpeters · April 20, 2021, 8:00pm

You could initialise the dataloader inside the epoch loop with a different set of transforms for every epoch, yes.

Lucky_Magna · April 20, 2021, 8:02pm

Does this regularize the model in any way? If so can you explain why?

cpeters · April 20, 2021, 8:09pm

It probably provides regularisation the same as any other augmentation?

You seem to be suggesting something along the lines of

for i in epoch:
    if i % 2 == 0:
        transforms = first set of transforms
    else:
        transforms = second set of transforms
    # make dataloader with transform
    # train

which is functionally identical to just having half as many epochs and sequentially training it on the dataset with each set of transforms. It wouldn’t be as nicely shuffled, but if you’re using the same source images anyway it probably wouldn’t matter.

Lucky_Magna · April 20, 2021, 8:12pm

Thanks a lot, that cleared a lot of things.