Data augmentation in pytorch!

edshkim98 · October 8, 2020, 12:32am

Hello everyone,
I am trying to develop a deep learning network and here I am planning to use Data augmentation.
In this augmentation, it adds random noise e.g. np.random.randn().
My question is,
if I use above method in pytorch, then would the dataset be updated with new random noise for every
epoch? Hence, there will be no same dataset repeatedly trained.
If not, how does data augmentation work in pytorch? Does it also use my original dataset too for training?

Thank you

Jeremy_Tavrisov · October 8, 2020, 1:46am

Depends on how you implement it. If you you have a dataset and you put the augmentation step in get_item then that would happen every batch so it would be unique across epochs.

KarthikR · October 8, 2020, 3:26am

Hi,

There is a good amount of discussion in this thread

edshkim98 · October 8, 2020, 3:37pm

Thank you for all of your answers!
@Jeremy_Tavrisov, so in that case, would my model ever use original dataset(e.g. before transformation)?
I want to also ask you then wouldn’t that prevent accuracy or performance to increase, as no dataset is repeatedly trained throughout epochs?
How can I transform my dataset so that it does not augment my dataset for every epoch differently?

Jeremy_Tavrisov · October 8, 2020, 4:45pm

Again this is dependent on your implementation. I think the on-the-fly augmentation generally works better. No dataset is repeatedly trained means less for it to overfit on. Augmentation is generally small perturbations of the original set.

If you wanted to you could augment ahead of time before training so that you have 1 set of transformed samples or design your augmentations such that they are not random across epochs.

edshkim98 · October 8, 2020, 4:50pm

Thanks for the help, I will try!!