WeightedRandomSampler not equally sampling

lfolle · January 6, 2020, 4:55pm

Why does the torch.utils.data.WeightedRandomSampler sample theses values only approximately equal with equal weights?

list(WeightedRandomSampler([0.3, 0.3, 0.3], 1000, replacement=True)).count(2)
Out[15]: 346
list(WeightedRandomSampler([0.3, 0.3, 0.3], 1000, replacement=True)).count(1)
Out[16]: 339
list(WeightedRandomSampler([0.3, 0.3, 0.3], 1000, replacement=True)).count(0)
Out[17]: 319

Shouldn’t this yield 333, 333, 333 ?

ptrblck · January 6, 2020, 9:34pm

The passed weights are used in torch.multinomial as seen here, which is a random function, so you cannot expect a perfectly sampled distribution.

The sample count will approximate the expected number, the more samples you are drawing.

lfolle · January 6, 2020, 9:37pm

Thanks for the reply. I understand that due to the randomness the sampling is not perfectly uniform.
But I would not expect these results:

list(WeightedRandomSampler([0.3, 0.3, 0.3], 100000, replacement=True)).count(0)
Out[5]: 33548
list(WeightedRandomSampler([0.3, 0.3, 0.3], 100000, replacement=True)).count(1)
Out[6]: 33101
list(WeightedRandomSampler([0.3, 0.3, 0.3], 100000, replacement=True)).count(2)
Out[7]: 33176

Is there a way to make this more uniform?

ptrblck · January 6, 2020, 9:49pm

These numbers would represent a ~0.6% error, wouldn’t they?

Do you need an exact amount of class samples during your sampling?
If so, I think writing a custom sampler with a specified expected class count might be the better approach.

lfolle · January 6, 2020, 9:54pm

Yes you are right. I guess in case one want to generate a uniform distribution over imbalanced classes with low number of samples the WeightedRandomSampler might not be the best solution (as for num_samples=100 we have a deviation of up to 5 %).

Thanks again for your help!