Selective Augmentation/Modifying DataLoader

Hi all,

Few questions. First, we want to compute some metrics during training and after each epoch, take the 10% and only apply augmentation to those examples from the dataset. If we have a custom dataset, is it best to subclass the DataLoader class on top of a Dataset class? What’s the best way to be able to change which examples we will be augmenting epoch to epoch?

You could use an internal attribute in your custom Dataset to tag the samples, which should be augmented, and use it in the __getitem__ as a condition. At the beginning you could leave it empty and manipulate it after each epoch via directly accessing the internal dataset through the DataLoader.
This approach should work, if you access the loader.dataset after each epoch is done and if persistent_workers=False is used.

1 Like

Thanks for this! I think the solution that we went with is similar to this answer you posted where we create the dataLoader on the fly each time with minimal slowdowns :slight_smile: