This is probably more of a stackoverflow kind of question but I’ve [found]
(https://stats.stackexchange.com/questions/235844/should-training-samples-randomly-drawn-for-mini-batch-training-neural-nets-be-dr) a [bunch] (https://datascience.stackexchange.com/questions/10204/should-i-take-random-elements-for-mini-batch-gradient-descent) of links that I’m not really sure answer the question entirely. It’s kind of a practical issue as well.
In torchtext the batch iterator shuffles the training data, put it into batches and then infinitely returns batches in a random order. The order of the observations within the batches is always the same. It seems unlikely this will cause any issues with training. All the same I was wondering if it’s considered more optimal to shuffle within each batch after every epoch?