I guess that data augmentation was used with two transformations: random crop and random horizontal flip. Thus, I would expect the obtained total number of training samples to be 3 times the size of the training set of Cifar-10, i.e. 3*50000 = 150000. However, the output of the above code is:
len(train_loader) = 391
which means there are approximately 391*128 ~= 50000 samples. Where are the augmented data?
in any epoch the dataloader will apply a fresh set of random operations “on the fly”. So instead of showing the exact same items at every epoch, you are showing a variant that has been changed in a different way. So after three epochs, you would have seen three random variants of each item in a dataset.
That said: I don’t think your counting method works for estimating the number of samples in the augmented set: The flip will double the number of pictures, but the crop has many potential outcomes. Also you would need to multiply the relative increases. (One might also question whether the augmented samples fully count, but that is a different discussion.)
The idea is that, according to your words, @tom , every epoch the data is augmetated. I’m still puzzled with this, does this mean that data was transformed in every epoch? So, how to do make 3x times data in every specific epoch? To make 50000x3=150000 data in one certain epoch?
@oneTaken it just means that the data is changed on the fly. The epoch size does not change, you just get randomly transformed samples every epoch. So the concept of a static dataset becomes a bit more dynamic.
Yeah, I use the some transorms to do this, such as torchvision.transforms.RandomCrop,
but how to double or many times the dataset? such as five_crop to 5x bigger the dataset?
I’m so confused. @smth .
It’s important to understand that the data_loader only goes through the files and indexes files and their target labels. The actual loading of the images happens when the get_item() function is called, which is basically when you enumerate the DataLoader. This is usually the line in the program which looks like this -
for data in train_loader:
It is at this time that the transformations are randomly done.
If you have random transformations, after each epoch, you will receive a new set of randomly transformed samples (e.g. rotated random num of deg).
In this case, it’s enough to multiply the number of epochs to get more samples.
But you may want to concat all of generated samples if your transformations
performs determined operations. (e.g. add padding)
Then just use ConcatDataset like here: Concatenation while using Data Loader
In my opinion, the first approach is slightly better to avoid overfitting.
After all, you can create a transformation that will be randomly applied or not.
You can create your own Dataset and internally load CIFAR10.
In the __getitem__ method you could check what the current target is and apply the transformation based on this condition.
There is a new movement right now in data augmentation I found it used in most of the super-resolution problem. for example these lines in “Residual Dense Network CVPR 2018”
we randomly extract 16 LR RGB patches with the size of 32 Ă— 32 as inputs. We randomly augment
the patches by flipping horizontally or vertically and rotating 90â—¦ . 1,000 iterations of back-propagation constitute an epoch
which I guess either they are doing nested-looping or they are augmenting the dataset offline before training.
They said training on DIV2K they have 1000 iteration per epoch for this patche size while you should have 50. it means 20 times larger training per epoch.