I am recently working on contrastive learning and use one dataloder to load data randomly for negative samples and one unshuffled so I can keep loading data from the same categories. In my understanding, multi workers run in parallels so the dataloader can pick up whatever batch that is ready at the moment. So if I have to keep the order of batches, it will generally slow down the loading process. I knew there is better and simple implementation for positive sampling that can goarround this issue, it is just my curiosity taking me here.
Btw, the data are un-preprocessed images, variies in size. And I cant preprocess and save them into my own place cuz its imagenet and its read only on that host.