The detail of shuffle in DataLoader

In Doc of DataLoader, shuffle (bool, optional): set to True to have the data reshuffled at every epoch (default: False). So, how to know the stop of one epoch, and then shuffle the training data. If I use this DataLoader with shuffle to load testing data, for example,

test_data = DataLoader(test_data_path)
model.eval()
test_model(test_data, model)
test_model(test_data, model)

I run test twice, is the order of test data same? In other words, how shuffle work for this case. There is no “epoch” concept.

When you iterate on the Dataloader fully, it is considered as one epoch.

for data in test_data:
    do_test(data)

# one epoch is completed/all data points have been tested

In case of testing, shuffle is normally set to False. So, the order of data remains same.

2 Likes

Now I only load part of my train set. I want to get the same part of my train set at every epoch. Is there any chance that I can set epoch concept.
My code like this:
I used this to get my dataloader

return DataLoader(
        dataset,
        batch_size=args.batch_size,
        shuffle=args.shuffle,
        num_workers=args.workers,
        pin_memory=True
    )

use this to get part of train set:

 for (input_tensor, target) in islice(train_loader,self.data_size):
    #my train code

You can use SubsetRandomSampler to do so.

https://pytorch.org/docs/stable/data.html#torch.utils.data.SubsetRandomSampler

I am facing some problem with shuffle = True in this thread : [DataLoader Problem] Problem arises when shuffle = True

Can I know where I am doing wrong ?