Difference between batch size in data loader and num_samples in Weighted Random Sampler

I would like to ask the difference between batch size in data loader and num_samples in Weighted Random Sampler.

I used to assign num_samples as num_samples=len(sample_weights) and my dataloader batch size is 16. Then after about 50 epochs, I changed my num_samples to num_samples=16 and the training accuracy went down, though my validation accuracy did not change much.

I am confused, I know num_samples means the number of samples we draw for each iteration, but is it the same as data loader batch size?

Thank you in advance

The num_samples argument in WeightedRandomSampler defines the number of samples drawn in each epoch, while the batch_size in the DataLoader defines the number of samples drawn in each iteration.

Thanks for clarifying @ptrblck , I wonder which one is better drawing samples from entire length of dataset or drawing less?

The common approach would be to allow the sampler to use all samples unless you have a valid reason to reduce the number of samples and decrease the number of samples in each epoch.
Note that even then you would still be able to draw all samples, but would need to increase the number of epochs to draw the same number of samples.

1 Like