Dataloader with two sets of augmentation on same image

Hi,
I am trying to do weak and strong augmentation of the same set of images by maintaining the actual correspondence. I tried with ConcatDataset. Though images come from two sets of augmentations, it doesn’t maintain the one-to-one correspondence in the first half and 2nd half of the batch.
Please let me know if you have any idea.
Thank You

ConcatDataset will concatenate multiple Datasets sequentially, i.e. the len of the ConcatDataset will be the sum of all passed Datasets.
If you want to augment the same sample in a different way, you could use different transformations in the __getitem__ of the Dataset and return both transformed samples.
This would double your batch size, so that you could lower it in the DataLoader, if needed.

Hi, I’m trying to do something similar where each batch of bs samples should contain (bs/n) images each augmented n times (where n is fixed for a model run), i.e. each batch should be:
(t_1(x_1), … t_n(x_1))
(t_1(x_2), … t_n(x_2))
… . . … . . … . .
the transformations can all be defined by the same (randomised) function, but should be called n times tp give different results.
I have tried constructing a Transformation where call(self, x) outputs:
torch.cat([self.train_transform(x) for _ in range(n)], 0) [ or can use torch.stack ]
so a pre-defined “transform” is applied independently n times
(Note: I previously used [transform(x)] * n, but I think that may give n copies of the same transformed x?)
… but this seems v slow (1 batch of bs samples with n=1 transform is much faster than 1 batch of bs/n samples with n>1 transforms even though they both output the same number of transformed samples).

Is there a “best” way to do this? e.g.

  • using ConcatDataset to pull together n Datasets (but this seems heavy handed as the x’s are the same and the transform function is/can be the same?
  • or through some smarter way of calling the same transform and combining
  • or something else…

Many thanks

I don’t know how many workers in the DataLoader you are using (specified via num_workers), but note that you are increasing the work of each worker by a factor of n since each one loads and transforms n samples.
You could increase the number of workers to check if loading the data in the background could help.