Hi. This is almost a theory question. Sorry if this forum is not an appropriate place to ask.
Are there any advantages to shuffle a dataset to train Neural Networks when a batch size is 1?
Since normal networks (no recurrence inside) don’t have abilities to take a past (or future) input data into account, I have a feeling that shuffling the dataset does not make any difference as long as the batch size is 1.
Is this correct?
Shuffling is always a good idea.
For Example, you are writing a classifier for 10 different category and see category 1 as first 1000 examples and so on for rest of the category. In such a cases, networks is first overfitting to category 1 and then to other category. Network in such cases, is not able to generalize it’s learning for all the categories.
Shuffling data helps it break the structured learning and hence reducing bias for the same non-shuffled dataset.
Oh yeah. That’s a so reasonable explanation. Thank you so much!
Don’t forget to do k-fold cross validation too.