Shuffling within batches after each epoch

This is probably more of a stackoverflow kind of question but I’ve [found]
(https://stats.stackexchange.com/questions/235844/should-training-samples-randomly-drawn-for-mini-batch-training-neural-nets-be-dr) a [bunch] (https://datascience.stackexchange.com/questions/10204/should-i-take-random-elements-for-mini-batch-gradient-descent) of links that I’m not really sure answer the question entirely. It’s kind of a practical issue as well.

In torchtext the batch iterator shuffles the training data, put it into batches and then infinitely returns batches in a random order. The order of the observations within the batches is always the same. It seems unlikely this will cause any issues with training. All the same I was wondering if it’s considered more optimal to shuffle within each batch after every epoch?

You can get both behaviors in torchtext (I believe it’s shuffle=True) but also I think it’s still unclear which is better for a given task – I’ve seen significant effects in both directions but haven’t carried out any sort of systematic analysis.

2 Likes

Good call, I missed the shuffle option. Good to know that it’s another thing to keep track of in terms of possible influences on training performance.