Hi,
I am dealing an imbalanced dataset, say #(postive) = 1K and #(negative) = 50K.
Then I find a library imbalanced-dataset-sampler(which is based on torch.multinomial) to resample to reduce the skewness.
For the basic usage, it pass and array of data weight to torch.multinomial then return the sampled indices(with replacement).
Example
# weight for each data point, 2e-5 = 1/#(negative), 1e-3 = 1/#(positive)
weights = [2e-5, 1e-3, 2e-5, ...]
sampled_indices = torch.multinormail(
weights,
num_samples = (num_pos+num_neg),
replacement = True
)
However, when I query original dataframe with this sampled indies, the #(positive) and #(negative) are almost equal. I am wondering the reason behind, can someone give me some tips?
Moreover, I would like to adjust the pos to neg ratio, how do I implement with torch.multinomial?
Thanks.