Shuffling within batches after each epoch

This is probably more of a stackoverflow kind of question but I’ve [found]
( a [bunch] ( of links that I’m not really sure answer the question entirely. It’s kind of a practical issue as well.

In torchtext the batch iterator shuffles the training data, put it into batches and then infinitely returns batches in a random order. The order of the observations within the batches is always the same. It seems unlikely this will cause any issues with training. All the same I was wondering if it’s considered more optimal to shuffle within each batch after every epoch?

You can get both behaviors in torchtext (I believe it’s shuffle=True) but also I think it’s still unclear which is better for a given task – I’ve seen significant effects in both directions but haven’t carried out any sort of systematic analysis.


Good call, I missed the shuffle option. Good to know that it’s another thing to keep track of in terms of possible influences on training performance.