Few questions. First, we want to compute some metrics during training and after each epoch, take the 10% and only apply augmentation to those examples from the dataset. If we have a custom dataset, is it best to subclass the DataLoader class on top of a Dataset class? What’s the best way to be able to change which examples we will be augmenting epoch to epoch?
You could use an internal attribute in your custom
Dataset to tag the samples, which should be augmented, and use it in the
__getitem__ as a condition. At the beginning you could leave it empty and manipulate it after each epoch via directly accessing the internal dataset through the
This approach should work, if you access the
loader.dataset after each epoch is done and if
persistent_workers=False is used.
Thanks for this! I think the solution that we went with is similar to this answer you posted where we create the dataLoader on the fly each time with minimal slowdowns