Samplers vs Shuffling

nategr_03 · March 19, 2020, 9:48am

Hi,

I’m new to PyTorch and was wondering how I should shuffle my training dataset. I’ve seen some examples that use a RandomSampler, as follows:

train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

What if I did not use a sampler at all and instead set the shuffle parameter to True, as follows:

train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_dataloader = DataLoader(train_data, shuffle=True, batch_size=batch_size)

Is this the same thing?

Also, using a SequentialSampler or setting shuffle parameter to False, are they the same thing?

Thanks

ptrblck · March 20, 2020, 4:11am

If you use shuffle = True the DataLoader will initialize a RandomSampler for you, otherwise it’ll use SequentialSampler as seen in these lines of code.

nategr_03 · March 20, 2020, 10:09am

Thanks for the clarification!