Excellent info @Nikronic. That matches what I see during runtime. Any thoughts why the PyTorch team took this approach vs a more static augmentation option? I’m thinking changing the training data on the fly so to say could have both positive and negative effects.
I do not know exactly why this approach is chosen, but it solved my problem even on huge datasets like Places with about 1.8 million images. I augmented it 6x bigger and everything seems fine!
Maybe if you explain your question more specifically, one of PyTorch developers answer you.