WeightedRandomSampler batch size argument vs DataLoader batch size

I am not quite sure how to properly use a WeightedRandomSampler with a DataLoader. Both constructors take a batch_size argument. DataLoader also has an optional sampler argument. I am trying to use a WeightedRandomSampler as the sampler in my DataLoader as follows:

sampler = WeightedRandomSampler(trainset.sample_weights, batch_size=BATCH_SIZE, replacement=False)
trainloader = DataLoader(dataset=trainset, batch_size=BATCH_SIZE, num_workers=6, sampler=sampler)
valloader   = DataLoader(dataset=valset, batch_size=BATCH_SIZE, num_workers=6, shuffle=True)

However, when I iterate over the trainloader, I only get one batch!!! even though BATCH_SIZE=8 and len(trainset) is in the thousands!!! if I remove the sampler argument from my DataLoader, I get the correct number of batches of size 8 each with the last batch < 8. Why do I get only one batch when I pass the sampler to my DataLoader? I am new to torch and trying to learn it. I tried reading the source code, but really unclear as to what’s going on.

1 Like

Hi,

In your WeightedRandomSampler you must specify the number of samples you want to draw during one epoch (this is not a batch_size argument), your batch_size being specify in the DataLoader.
Maybe you can try this:

sampler = WeightedRandomSampler(trainset.sample_weights, num_samples=len(trainset), replacement=False)
trainloader = DataLoader(dataset=trainset, batch_size=BATCH_SIZE, num_workers=6, sampler=sampler)
valloader   = DataLoader(dataset=valset, batch_size=BATCH_SIZE, num_workers=6, shuffle=True)

thank you. I’d have to look at the source with this new info. However, there is a batch_size argument for WeightedRandomSampler(), I wonder why you need num_samples as well? are there situations where you’d use both?? combined with the fact that it is passed to a DataLoader() which has its own batch_size… I’ll just say this is a confusing interface.

1 Like

It may depend on the version of pytorch you use, in the latest one (1.0.0), WeightedRandomSampler has not a batch size argument anymore: https://pytorch.org/docs/stable/_modules/torch/utils/data/sampler.html
But yes I agree that the interface is a bit confusing !