Hi,
I am dealing an imbalanced dataset, say #(postive) = 1K and #(negative) = 50K.
Then I find a library imbalanced-dataset-sampler(which is based on torch.multinomial
) to resample to reduce the skewness.
For the basic usage, it pass and array of data weight to torch.multinomial
then return the sampled indices(with replacement).
Example
# weight for each data point, 2e-5 = 1/#(negative), 1e-3 = 1/#(positive)
weights = [2e-5, 1e-3, 2e-5, ...]
sampled_indices = torch.multinormail(
weights,
num_samples = (num_pos+num_neg),
replacement = True
)
However, when I query original dataframe with this sampled indies, the #(positive) and #(negative) are almost equal. I am wondering the reason behind, can someone give me some tips?
Moreover, I would like to adjust the pos to neg ratio, how do I implement with torch.multinomial
?
Thanks.