I am dealing an imbalanced dataset, say #(postive) = 1K and #(negative) = 50K.
Then I find a library imbalanced-dataset-sampler(which is based on
torch.multinomial) to resample to reduce the skewness.
For the basic usage, it pass and array of data weight to
torch.multinomial then return the sampled indices(with replacement).
# weight for each data point, 2e-5 = 1/#(negative), 1e-3 = 1/#(positive) weights = [2e-5, 1e-3, 2e-5, ...] sampled_indices = torch.multinormail( weights, num_samples = (num_pos+num_neg), replacement = True )
However, when I query original dataframe with this sampled indies, the #(positive) and #(negative) are almost equal. I am wondering the reason behind, can someone give me some tips?
Moreover, I would like to adjust the pos to neg ratio, how do I implement with