I am coding for training loop with dataloader which is flagged shuffle=True.
to my knowledge, people usually coding epoch and iteration for dataloader as follows:
for epoch in epochs:
for iter, (input, target) in enumerate(dataloader):
"""Do Training"""
With this commonsense, I got question.
how enumerate function shuffle the data order in the dataloader?
what is in the dataloader called by the enumerate function?
If you specify your dataloader with shuffle=True and a specific batch size the dataloader will shuffle your data and put the shuffled data into batches of your specified size.
If you then call enumerate() on the dataloader you will be able to loop over the shuffled batches of your predefined size and get a counter as well.
That means dataloader, with shuffle=True and given ‘batch_size’, calculate the number of batches and the loader will shuffle if it reaches to the end by enumerate.
but the data is already shuffled at the time you call enumerate() on the dataloader.
The dataloader first shuffles the data and puts it into batches. When you call enumerate(dataloader) you iterate over these shuffled batches.
The batches themselves get not shuffled by calling enumerate() but the data contained in these batches is shuffled by the dataloader (and also gets reshuffled every epoch according to the pytorch doc).