We have some problems with the shuffling property of the dataloader. It seems that dataloader shuffles the whole data and forms new batches at the beginning of every epoch.
However, we are performing semi supervised training and we have to make sure that at every epoch the same images are sent to the model.
For example let’s say our batches are as the following:
Batch 1 consists of images [a,b,c,…]
Batch 2 consists of images [ f,g,h,…]
Batch n consists of images [x,y,z,…]
So after one epoch we need the exact same batches at the other epochs as well. Because at every epoch we are using the images a,f,… and x from the example above. The model needs other images, that’s why we cannot eliminate them, however there is also a decent amount of necessity to obtain these specific first images.
Our training method necessitates that we should shuffle the data in the very beginning, form batches from that shuffled data, and use the same exact batches in the rest of the training.
In the beginning using a dataloader wouldn’t cause any problems but as I’ve mentioned before we’ve seen that new batches are formed at each epoch. We have also tried using the SubsetRandomSampler, but couldn’t accomplish anything.
Thank you so much for your responses :))
However, I am not familiar enough with the concept of samplers, can you propose a way to shuffle the dataset in the beginning?
Maybe then we can disable the shuffle option of the dataloader and obtain what we want.
I have the same problem in my project. I want to shuffle the dataset in the beginning of the training, just once. Although I use SubsetRandomSampler, the dataset is shuffled every epoch. In my research through the internet, I found that every iteration of complete dataset is considered as one epoch by the DataLoader. Do you have a solution for this issue? Thank you
Hello everyone,
Thank you so much for all of your responses and yes we did it. Apparently scikit-library has a tool named shuffle after importing that we’ve created a new variable for the shuffled version of the data. Giving that to the dataloader and disabling shuffling solved our problem.