Weighted random sampler pytorch

Prachi · August 23, 2022, 1:34pm

Can someone explain intuitively how a weighted random sampler works?

list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True))

E.g. The above code says:
If I have an image dataset, here I have 6 images and each has probs (weights).
Now the sampler draws according to these weights meaning there are high chances to choose the image at index 1 then at index 3, 5, 2, 4, and 0 that too with replacement.

list(WeightedRandomSampler([0.9, 0.4, 0.05, 0.2, 0.3, 0.1], 5, replacement=False))
Here are high chances that the sampler will choose images at the 0th index, 1st index,4th index, and so on but without replacement so that each image will get selected as we num_samples = 5.

Thank you!!

ptrblck · August 23, 2022, 8:57pm

I’m unsure where exactly you are stuck, but generally you can see the passed weights similar to probabilities, i.e. the higher, the more likely the sampler will pick this particular sample.
The main difference is of course that the sum of all weights does not need to be 1. as would be the case with probabilities.
The replacement argument just defines if the sampler can re-draw the same sample again or if you would remove the sample from the “pool” after it was drawn.

Prachi · August 29, 2022, 6:38pm

@ptrblck Thanks for your reply!

I have a CSV file of 100k rows and two columns = [‘ImageId’, ‘weight’], weights are in the range of [0,1], I want to make use of PyTorch’s weighted random sampler to sample images according to the associated weights. I don’t want to build an efficient sampler from scratch as I assume PyTorch’s APIs would save me time as I am desperately trying to achieve this. Any leading points would be really helpful.