Data augmentation in PyTorch

f10w · May 6, 2019, 8:41am

An epoch can be loosely defined as any number of iterations, not necessarily a pass through the entire training dataset. I think this is the case for the above paper.

falmasri · May 6, 2019, 8:55am

What makes it difficult for comparison of you don’t know these given details in this paper.

f10w · May 6, 2019, 10:46am

Not sure I understood correctly what you said, but that detail was given in the paper: 1 epoch = 1000 iterations.

falmasri · May 6, 2019, 11:43am

I meant when these details are not given which is common in most recent papers. models comparison on epochs evaluation is not possible.

rzhang63 · October 29, 2019, 5:52pm

So if we use random crop to transform the training images, does it mean that the original images are never used in training? Is it only the transformed images that are fed into the optimization process?

tom · October 29, 2019, 7:30pm

Random crop assumes that the original image is larger than the target input size. So you cannot feed the originals as is.

Best regards

Thomas

rzhang63 · October 29, 2019, 7:43pm

Thank you for your reply!

Then what about other transformations such as random horizontal flip? Are the originals never used as well?

ptrblck · October 29, 2019, 10:56pm

Random horizontal flip flips the image with the given probability, so with probability p you’ll get the flipped image, with 1-p you’ll get the original image.

Andrew_Wagner · June 5, 2020, 1:27pm

I know this is an old post by now, but doesn’t your code snippet leak the output class to the trainer? i.e. you might make all of the cat pictures light and all of the dog pictures dark, and then the network could learn to identify things based on your augmentation, rather than what’s actually in the picture.

I’m new to pytorch, and came here while googling around to find whether my augmentation should subclass Dataset or Dataloader, so maybe I missed something subtle.

ptrblck · June 5, 2020, 11:12pm

Yes, this could be the case and I’m not familiar with @Josiane_Rodrigues’s use case, so she might give you more information on it.

Maybe she is dealing with a custom dataset, which “needs” the different preprocessing steps?

Josiane_Rodrigues · June 6, 2020, 12:38am

Hi @Andrew_Wagner, at the time I hadn’t thought of treating data that way, but it’s a good idea and it can work.

Dipayan_Das · August 1, 2020, 4:54am

This thread has been very helpful in understanding pytorch augmentation. However, i am still lingering with some questions:
~ If I am applying on-the-fly augmentation transforms, then I have to actually wait more number of epochs to cover all the possible transformed images. Is there any custom way of covering all these possible transformed images in one epoch ?
~ Is there a way to set a max_limit on the number of transformed images I want to use in one epoch ? (i.e, if I want to increase the number of back-prop iterations more than len(dataset)//batch_size )

A code-snippet shall be much desired.

Thank you

ptrblck · August 2, 2020, 2:33am

While you could calculate the number of all transformed images for e.g. a random flipping in one dimension (it would just double), this won’t be feasible for other transformations, such as random rotation which work using floating point numbers. The number of “all possible transformed images” could be theoretically calculated, but wouldn’t make much sense as it would be huge.
You could manipulate the length of the Dataset by changing the return value in Dataset.__len__(self).

Xuei · March 11, 2023, 6:43pm

So, if I have 100 images and then transform, 1 epoch is 100, and 2nd epoch is 100 too, but that’s 100 different images because they’ve been transformed?. so I can have 200 different data

ptrblck · March 11, 2023, 7:54pm

No, I wouldn’t call them “different images” since you are only augmenting each sample.
Since random operations would most likely never repeat exactly the same transformation for each sample you would otherwise claim to have an infinite dataset.

Xuei · March 14, 2023, 2:06am

So, if I have 100 images and then transform, 1 epoch is 100, and 2nd epoch is 100 too, but that’s 100 different images because they’ve been transformed?. so I can have 200 different data

ptrblck · March 14, 2023, 5:38am

No, I think your logic is flawed, since I could use a single image, transform it in each iteration and claim I have unlimited samples expecting my model to never overfit.
Transformations augment the data but do not create new data samples.