Samplers vs Shuffling

Hi,

I’m new to PyTorch and was wondering how I should shuffle my training dataset. I’ve seen some examples that use a RandomSampler, as follows:

train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_sampler = RandomSampler(train_data)
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

What if I did not use a sampler at all and instead set the shuffle parameter to True, as follows:

train_data = TensorDataset(train_inputs, train_masks, train_labels)
train_dataloader = DataLoader(train_data, shuffle=True, batch_size=batch_size)

Is this the same thing?

Also, using a SequentialSampler or setting shuffle parameter to False, are they the same thing?

Thanks

If you use shuffle = True the DataLoader will initialize a RandomSampler for you, otherwise it’ll use SequentialSampler as seen in these lines of code.

1 Like

Thanks for the clarification!