Unsure how to apply multiple deterministic data augmentation methods

Hello, I am attempting to copy the data augmentation methods applied in a certain paper, but I don’t understand how to do it. (The work in the paper was done using TensorFlow.)

Note that I am a PyTorch noob.

Background (perhaps not important):

I am trying to see whether the application of a pre-trained CNN can improve on the results obtained in the paper, so I have some interest in trying to hold other things (such as data augmentation techniques) more or less constant, so it’s easier to suss out the effect the pre-trained network.

More detail:

I am working with a dataset containing approximately 14,000 images. In the aforementioned paper, the following augmentations are applied (ignoring resizing):

-five crop
-rotation by 0, 90, 180 and 270 degrees
-reflection about the line y = x.

This increases the dataset by a factor of 5 x 4 x 2 = 40, so from about 14,000 to about 560,000. According to the paper, these augmentations do not appear to be random, so I’m interested in applying them deterministically.

I am aware of torchvision.transforms.functional, but I don’t understand how to use it. I will share some of my code below.

Code:

I’m only sharing part of my code, to avoid making this post too long. My code below uses random transforms because I’ve just been trying to get the network to train and test properly, which it does. When I try to use functional transforms I get errors.

My transforms:

The lines that are commented out is stuff I was experimenting with.

#import torchvision.transforms.functional as tf

img_height, img_width = 256, 256

size = [224, 224]

torch.manual_seed(17)

transform = transforms.Compose([
    
                transforms.ToTensor(),
                                            
                transforms.Resize((img_height, img_width)),
    
                transforms.RandomCrop(size),
                transforms.RandomHorizontalFlip(),
                transforms.RandomVerticalFlip(),
                transforms.RandomRotation(180),
    
    
                # transforms.FiveCrop(size),
                # transforms.Lambda(lambda crops: torch.stack([transforms.ToTensor()(crop) for crop in crops])),
                # transforms.Lambda(lambda crops: torch.stack([transforms.PILToTensor()(crop) for crop in crops])),
    
    
#                 tf.five_crop(tf.rotate(img, 90), size),
#                 tf.five_crop(tf.rotate(img, 180), size),
#                 tf.five_crop(tf.rotate(img, 270), size),
                
#                 tf.five_crop(tf.rotate(tf.vflip(img), 90), size),
#                 tf.five_crop(tf.rotate(tf.rotate(tf.vflip(img), 90), 90), size),
#                 tf.five_crop(tf.rotate(tf.rotate(tf.vflip(img), 90), 180), size),
#                 tf.five_crop(tf.rotate(tf.rotate(tf.vflip(img), 90), 270), size),
])

# def transform1(image):
#     return tf.five_crop(tf.resize(image, (img_height, img_width)), size)

My code for training the model is directly copied from this link: https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

My data loader and associated dictionary:

from torch.utils.data import DataLoader

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=True)
dataloaders_dict = {
    "train": train_dataloader,
    #"test": test_dataloader # I think the code I'm using uses 'val' instead of 'test', so I'm renaming this to 'val' below
    "val": test_dataloader
}

My hope is that (assuming it’s not too hard) someone can just show me how to write the desired transforms.

Thanks for any help.

How would you like to sample these images during training?
If I understand the use case correctly you are not planning to apply random transformations, but would like to statically increase the dataset by creating the transformed crops.

If no shuffling is desired, you could apply the transformations directly in the __getitem__ and could return all transformed versions for the currently loaded sample (your batch size would then increase by the factor of different transformations). On the other hand, if you want to shuffle the samples during training, creating the transformed samples and storing them again locally might be the easier approach.

Thanks for your reply ptrblck. The paper I’m reading says that “we apply several data augmentation techniques… As implemented, these techniques combine to increase the number of images by a factor of 40, from 14,034 to a maximum of 561,360 images.” To me, this seems to mean that there is no randomness involved, and that the dataset is statically increased by a factor of 40.

It is my understanding that it’s standard to shuffle the training data, so every epoch the training proceeds through the training data in a new order. I would assume that that’s the case in this paper. You appear to be telling me that if I want to shuffle the training data every epoch, my best bet might be to create all the transformed images locally, so that I bring a dataset of size 561,360 to PyTorch and then forgo any transformations in PyTorch. I will have to investigate how this might be done.

I also had another idea, but I’m not sure it makes sense. Let’s say, picking simple numbers, that I statically increased my dataset by a factor of 10 using certain transformations, and then trained for 10 epochs. Would it be approximately similar to skip the initial (static) data augmentation, instead apply the same transformations (randomly) in PyTorch, and then train for 10 * 10 = 100 epochs? I wonder if this might give me approximately the same result in the end. If it is approximately the same, then this might be the simplest, best approach for me.

Thanks for your help.

Yes, I think this would stick to the reference implementation from the paper.
To do so, you could iterate the Dataset once creating all additional samples from the original sample, and either store them directly via torch.save(tensor, path) or transform them back to an image and save it.
During training you wouldn’t need to transform them anymore and could simply load them.

Yes, I had the same idea but was concerned about the static transformations mentioned in the paper.
While it would give approximately the same results, you would still apply the transformations randomly, so it wouldn’t be the same implementation.

ptrblck,

I should have replied earlier, but thanks for your help. I need to speak with my advisor to help decide how closely I need to hew to the method used in the paper. The approximate method mentioned above is certainly much simpler.