Dataloader with two sets of augmentation on same image

vikash0837 · June 21, 2021, 7:53pm

Hi,
I am trying to do weak and strong augmentation of the same set of images by maintaining the actual correspondence. I tried with ConcatDataset. Though images come from two sets of augmentations, it doesn’t maintain the one-to-one correspondence in the first half and 2nd half of the batch.
Please let me know if you have any idea.
Thank You

ptrblck · June 22, 2021, 5:32am

ConcatDataset will concatenate multiple Datasets sequentially, i.e. the len of the ConcatDataset will be the sum of all passed Datasets.
If you want to augment the same sample in a different way, you could use different transformations in the __getitem__ of the Dataset and return both transformed samples.
This would double your batch size, so that you could lower it in the DataLoader, if needed.

Carl1 · April 18, 2023, 7:45pm

Hi, I’m trying to do something similar where each batch of bs samples should contain (bs/n) images each augmented n times (where n is fixed for a model run), i.e. each batch should be:
(t_1(x_1), … t_n(x_1))
(t_1(x_2), … t_n(x_2))
… . . … . . … . .
the transformations can all be defined by the same (randomised) function, but should be called n times tp give different results.
I have tried constructing a Transformation where call(self, x) outputs:
torch.cat([self.train_transform(x) for _ in range(n)], 0) [ or can use torch.stack ]
so a pre-defined “transform” is applied independently n times
(Note: I previously used [transform(x)] * n, but I think that may give n copies of the same transformed x?)
… but this seems v slow (1 batch of bs samples with n=1 transform is much faster than 1 batch of bs/n samples with n>1 transforms even though they both output the same number of transformed samples).

Is there a “best” way to do this? e.g.

using ConcatDataset to pull together n Datasets (but this seems heavy handed as the x’s are the same and the transform function is/can be the same?
or through some smarter way of calling the same transform and combining
or something else…

Many thanks

ptrblck · April 19, 2023, 6:05am

I don’t know how many workers in the DataLoader you are using (specified via num_workers), but note that you are increasing the work of each worker by a factor of n since each one loads and transforms n samples.
You could increase the number of workers to check if loading the data in the background could help.