The detail of shuffle in DataLoader

igreen · October 1, 2018, 10:54pm

In Doc of DataLoader, shuffle (bool, optional): set to True to have the data reshuffled at every epoch (default: False). So, how to know the stop of one epoch, and then shuffle the training data. If I use this DataLoader with shuffle to load testing data, for example,

test_data = DataLoader(test_data_path)
model.eval()
test_model(test_data, model)
test_model(test_data, model)

I run test twice, is the order of test data same? In other words, how shuffle work for this case. There is no “epoch” concept.

InnovArul · October 1, 2018, 10:59pm

When you iterate on the Dataloader fully, it is considered as one epoch.

for data in test_data:
    do_test(data)

# one epoch is completed/all data points have been tested

In case of testing, shuffle is normally set to False. So, the order of data remains same.

LW-Ricarido · October 31, 2018, 8:07am

Now I only load part of my train set. I want to get the same part of my train set at every epoch. Is there any chance that I can set epoch concept.
My code like this:
I used this to get my dataloader

return DataLoader(
        dataset,
        batch_size=args.batch_size,
        shuffle=args.shuffle,
        num_workers=args.workers,
        pin_memory=True
    )

use this to get part of train set:

 for (input_tensor, target) in islice(train_loader,self.data_size):
    #my train code

InnovArul · October 31, 2018, 11:45am

You can use SubsetRandomSampler to do so.

https://pytorch.org/docs/stable/data.html#torch.utils.data.SubsetRandomSampler

basingse · May 20, 2019, 11:05am

I am facing some problem with shuffle = True in this thread : [DataLoader Problem] Problem arises when shuffle = True

Can I know where I am doing wrong ?