I’m using WeightedRandomSampler for a long-tailed dataset to get more balanced samples. It works okay, but quite a few samples are not selected. So I give less imbalanced weights to cover more of the dataset, but still many samples are never selected. Then I try the same weight for every sample, but it still samples imbalancedly.
For example,
samples = list(torch.utils.data.WeightedRandomSampler([0.01 for _ in range(100)], 100, replacement=True))
samples_counter = collections.Counter(samples)
And len(samples_counter) is usually around 66 (2/3 * 100), which means only 2/3 of all samples are sampled even they share the same weight.
Changing the number of samples, I still got similar result that only 2/3 of all samples are chosen. But I’m expecting that roughly all samples can be selected because they are given the same weights. So how can I cover more of the dataset with the WeightedRandomSampler?
Thanks for any help.