Increase dataset size using Data Augmentation

Is there any way to increase dataset size using image augmentation in pytorch, like making copies of same images with variations like cropping or other techniques that are available in torchvision transforms. I used the code mentioned below, but I want to oversample the dataset and check how that affects the models performance.

transform = {

'train':

transforms.Compose([

    transforms.RandomResizedCrop(size=256, scale=(0.8, 1.0)),

    transforms.RandomRotation(degrees=15),

    transforms.ColorJitter(),

    transforms.RandomHorizontalFlip(),

    transforms.CenterCrop(size=224),  # Image net standards

    transforms.ToTensor(),

    transforms.Normalize([0.485, 0.456, 0.406],

                         [0.229, 0.224, 0.225])  # Imagenet standards

]),

'val':

transforms.Compose([

    transforms.Resize(size=256),

    transforms.CenterCrop(size=224),

    transforms.ToTensor(),

    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

]),

'test':

transforms.Compose([

    transforms.Resize(size=256),

    transforms.CenterCrop(size=224),

    transforms.ToTensor(),

    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

]),

}

Thankyou

1 Like

You mean sample more than the entire dataset per epoch? That’s functionally identical to simply increasing the number of epochs you have when you’re using random transformations.

Well, actually yes, but not during each epoch instead I want to prepare a secondary dataloader with images that are double the size of my original dataloader and use them for training in each epoch. But your answer made me think of using different set of transforms in each epoch on the original dataset. Can I do that? This my be a silly doubt but please help me.

You could initialise the dataloader inside the epoch loop with a different set of transforms for every epoch, yes.

Does this regularize the model in any way? If so can you explain why?

It probably provides regularisation the same as any other augmentation?

You seem to be suggesting something along the lines of

for i in epoch:
    if i % 2 == 0:
        transforms = first set of transforms
    else:
        transforms = second set of transforms
    # make dataloader with transform
    # train

which is functionally identical to just having half as many epochs and sequentially training it on the dataset with each set of transforms. It wouldn’t be as nicely shuffled, but if you’re using the same source images anyway it probably wouldn’t matter.

Thanks a lot, that cleared a lot of things.