Hi,
I am trying to do weak and strong augmentation of the same set of images by maintaining the actual correspondence. I tried with ConcatDataset. Though images come from two sets of augmentations, it doesn’t maintain the one-to-one correspondence in the first half and 2nd half of the batch.
Please let me know if you have any idea.
Thank You
ConcatDataset
will concatenate multiple Dataset
s sequentially, i.e. the len
of the ConcatDataset
will be the sum of all passed Dataset
s.
If you want to augment the same sample in a different way, you could use different transformations in the __getitem__
of the Dataset
and return both transformed samples.
This would double your batch size, so that you could lower it in the DataLoader
, if needed.
Hi, I’m trying to do something similar where each batch of bs samples should contain (bs/n) images each augmented n times (where n is fixed for a model run), i.e. each batch should be:
(t_1(x_1), … t_n(x_1))
(t_1(x_2), … t_n(x_2))
… . . … . . … . .
the transformations can all be defined by the same (randomised) function, but should be called n times tp give different results.
I have tried constructing a Transformation where call(self, x) outputs:
torch.cat([self.train_transform(x) for _ in range(n)], 0) [ or can use torch.stack ]
so a pre-defined “transform” is applied independently n times
(Note: I previously used [transform(x)] * n, but I think that may give n copies of the same transformed x?)
… but this seems v slow (1 batch of bs samples with n=1 transform is much faster than 1 batch of bs/n samples with n>1 transforms even though they both output the same number of transformed samples).
Is there a “best” way to do this? e.g.
- using ConcatDataset to pull together n Datasets (but this seems heavy handed as the x’s are the same and the transform function is/can be the same?
- or through some smarter way of calling the same transform and combining
- or something else…
Many thanks
I don’t know how many workers in the DataLoader
you are using (specified via num_workers
), but note that you are increasing the work of each worker by a factor of n
since each one loads and transforms n
samples.
You could increase the number of workers to check if loading the data in the background could help.