Data augmentation in PyTorch

There is something with PyTorch data augmentation that I would like to understand. I used the following code to create a training data loader:

rgb_mean = (0.4914, 0.4822, 0.4465)
rgb_std = (0.2023, 0.1994, 0.2010)

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.Normalize(rgb_mean, rgb_std),

kwargs = {'num_workers': 2, 'pin_memory': True} if args.cuda else {}
train_loader =, train=True, transform=transform_train,
    transforms.Normalize(rgb_mean, rgb_std),
]), download=False), batch_size=128, **kwargs)


I guess that data augmentation was used with two transformations: random crop and random horizontal flip. Thus, I would expect the obtained total number of training samples to be 3 times the size of the training set of Cifar-10, i.e. 3*50000 = 150000. However, the output of the above code is:

len(train_loader) = 391

which means there are approximately 391*128 ~= 50000 samples. Where are the augmented data?

Thank you in advance for your help!



in any epoch the dataloader will apply a fresh set of random operations “on the fly”. So instead of showing the exact same items at every epoch, you are showing a variant that has been changed in a different way. So after three epochs, you would have seen three random variants of each item in a dataset.

That said: I don’t think your counting method works for estimating the number of samples in the augmented set: The flip will double the number of pictures, but the crop has many potential outcomes. Also you would need to multiply the relative increases. (One might also question whether the augmented samples fully count, but that is a different discussion.)

Best regards



Thanks, Thomas! That totally makes sense. So the augmentation happens inside of this line:

for (data, target) in dataloader:

Very cool.

The idea is that, according to your words, @tom , every epoch the data is augmetated. I’m still puzzled with this, does this mean that data was transformed in every epoch? So, how to do make 3x times data in every specific epoch? To make 50000x3=150000 data in one certain epoch?

@oneTaken it just means that the data is changed on the fly. The epoch size does not change, you just get randomly transformed samples every epoch. So the concept of a static dataset becomes a bit more dynamic.


Yeah, I use the some transorms to do this, such as torchvision.transforms.RandomCrop,
but how to double or many times the dataset? such as five_crop to 5x bigger the dataset?
I’m so confused. @smth .

Well, you can see it this way: The new “ONE epoch” is in fact FIVE consecutive old “one epoch” stacked together.

1 Like

Can you give an experiment on that? It’s just hard to get this.

Let n denote the size of the original dataset. In ordinary augmentation (i.e. precomputed and static, not on the fly), if we have 5 transformations then the size of the augmented data is 5n, which means at each epoch the number of iterations is also 5n. Now, if we augment the data on the fly (with random transformations) using PyTorch, then each epoch has the same number of iterations n. If we concatenate 5 epochs consécutive to create a large epoch (or call it whatever you want), then the total number of iterations in this large epoch is 5n. Thus it is roughly equivalent to static augmentation. (Note that this large epoch is a valid epoch because there’s no duplicate in the 5n iteration since the transformations are random.)


It’s important to understand that the data_loader only goes through the files and indexes files and their target labels. The actual loading of the images happens when the get_item() function is called, which is basically when you enumerate the DataLoader. This is usually the line in the program which looks like this -

 for data in train_loader:

It is at this time that the transformations are randomly done.


For clarification:

If you have random transformations, after each epoch, you will receive a new set of randomly transformed samples (e.g. rotated random num of deg).
In this case, it’s enough to multiply the number of epochs to get more samples.

But you may want to concat all of generated samples if your transformations
performs determined operations. (e.g. add padding)
Then just use ConcatDataset like here: Concatenation while using Data Loader

In my opinion, the first approach is slightly better to avoid overfitting.
After all, you can create a transformation that will be randomly applied or not.


Is there any way to implement data augmentation only for some specific CIFAR-10 classes?

You can create your own Dataset and internally load CIFAR10.
In the __getitem__ method you could check what the current target is and apply the transformation based on this condition.


Could you give me a example?

Sure! You could start with the following code:

class MyDataset(Dataset):
    def __init__(self, train=True, transforms=None):
        self.cifar10 = datasets.CIFAR10(root='YOUR_PATH',
        self.transforms = transforms

    def __getitem__(self, index):
        x, y = self.cifar10[index]
        if self.transforms:
            print('Chosing transform ', y)
            x = self.transforms[y](x) # Chose class transform based on y
        return x, y

    def __len__(self):
        return len(self.cifar10)

class_transforms = []
for _ in range(10):
    transform = transforms.Compose([

dataset = MyDataset(transforms=class_transforms)
loader = DataLoader(dataset, batch_size=5, shuffle=True)
data, target = iter(loader).next()
> ('Chosing transform ', 9)
('Chosing transform ', 2)
('Chosing transform ', 5)
('Chosing transform ', 0)
('Chosing transform ', 2)
> tensor([ 9,  2,  5,  0,  2])

I have just applied random transformations to each class. Your class_transforms will probably look a bit more complicated. :wink:


Thanks, @ptrblck :smile:

Does that means will get different randomly transformed samples every epoch, and unlikely to get same transformed sample twice?

Maybe imgaug is the best choice.

Yes, I think so. I think the more transform options we use, the more likely we get different samples at every iteration.

There is a new movement right now in data augmentation I found it used in most of the super-resolution problem. for example these lines in “Residual Dense Network CVPR 2018”

we randomly extract 16 LR RGB patches with the size of 32 × 32 as inputs. We randomly augment
the patches by flipping horizontally or vertically and rotating 90◦ . 1,000 iterations of back-propagation constitute an epoch

which I guess either they are doing nested-looping or they are augmenting the dataset offline before training.
They said training on DIV2K they have 1000 iteration per epoch for this patche size while you should have 50. it means 20 times larger training per epoch.