Data augmentation in PyTorch

Yeah, I use the some transorms to do this, such as torchvision.transforms.RandomCrop,
but how to double or many times the dataset? such as five_crop to 5x bigger the dataset?
I’m so confused. @smth .

Well, you can see it this way: The new “ONE epoch” is in fact FIVE consecutive old “one epoch” stacked together.

1 Like

Can you give an experiment on that? It’s just hard to get this.

Let n denote the size of the original dataset. In ordinary augmentation (i.e. precomputed and static, not on the fly), if we have 5 transformations then the size of the augmented data is 5n, which means at each epoch the number of iterations is also 5n. Now, if we augment the data on the fly (with random transformations) using PyTorch, then each epoch has the same number of iterations n. If we concatenate 5 epochs consécutive to create a large epoch (or call it whatever you want), then the total number of iterations in this large epoch is 5n. Thus it is roughly equivalent to static augmentation. (Note that this large epoch is a valid epoch because there’s no duplicate in the 5n iteration since the transformations are random.)


It’s important to understand that the data_loader only goes through the files and indexes files and their target labels. The actual loading of the images happens when the get_item() function is called, which is basically when you enumerate the DataLoader. This is usually the line in the program which looks like this -

 for data in train_loader:

It is at this time that the transformations are randomly done.


For clarification:

If you have random transformations, after each epoch, you will receive a new set of randomly transformed samples (e.g. rotated random num of deg).
In this case, it’s enough to multiply the number of epochs to get more samples.

But you may want to concat all of generated samples if your transformations
performs determined operations. (e.g. add padding)
Then just use ConcatDataset like here: Concatenation while using Data Loader

In my opinion, the first approach is slightly better to avoid overfitting.
After all, you can create a transformation that will be randomly applied or not.


Is there any way to implement data augmentation only for some specific CIFAR-10 classes?

You can create your own Dataset and internally load CIFAR10.
In the __getitem__ method you could check what the current target is and apply the transformation based on this condition.


Could you give me a example?

Sure! You could start with the following code:

class MyDataset(Dataset):
    def __init__(self, train=True, transforms=None):
        self.cifar10 = datasets.CIFAR10(root='YOUR_PATH',
        self.transforms = transforms

    def __getitem__(self, index):
        x, y = self.cifar10[index]
        if self.transforms:
            print('Chosing transform ', y)
            x = self.transforms[y](x) # Chose class transform based on y
        return x, y

    def __len__(self):
        return len(self.cifar10)

class_transforms = []
for _ in range(10):
    transform = transforms.Compose([

dataset = MyDataset(transforms=class_transforms)
loader = DataLoader(dataset, batch_size=5, shuffle=True)
data, target = iter(loader).next()
> ('Chosing transform ', 9)
('Chosing transform ', 2)
('Chosing transform ', 5)
('Chosing transform ', 0)
('Chosing transform ', 2)
> tensor([ 9,  2,  5,  0,  2])

I have just applied random transformations to each class. Your class_transforms will probably look a bit more complicated. :wink:


Thanks, @ptrblck :smile:

Does that means will get different randomly transformed samples every epoch, and unlikely to get same transformed sample twice?

Maybe imgaug is the best choice.

Yes, I think so. I think the more transform options we use, the more likely we get different samples at every iteration.

There is a new movement right now in data augmentation I found it used in most of the super-resolution problem. for example these lines in “Residual Dense Network CVPR 2018”

we randomly extract 16 LR RGB patches with the size of 32 × 32 as inputs. We randomly augment
the patches by flipping horizontally or vertically and rotating 90◦ . 1,000 iterations of back-propagation constitute an epoch

which I guess either they are doing nested-looping or they are augmenting the dataset offline before training.
They said training on DIV2K they have 1000 iteration per epoch for this patche size while you should have 50. it means 20 times larger training per epoch.

An epoch can be loosely defined as any number of iterations, not necessarily a pass through the entire training dataset. I think this is the case for the above paper.

What makes it difficult for comparison of you don’t know these given details in this paper.

Not sure I understood correctly what you said, but that detail was given in the paper: 1 epoch = 1000 iterations.

I meant when these details are not given which is common in most recent papers. models comparison on epochs evaluation is not possible.

So if we use random crop to transform the training images, does it mean that the original images are never used in training? Is it only the transformed images that are fed into the optimization process?