I am unable to find code on how I can perform random over sampling and random under sampling on imbalanced dataset. Is there a pre-existing package or some code that is always used. I am relatively new to PyTorch. Any help would be appreciated!
You could use the WeightedRandomSampler
to over- or undersample the dataset.
Here is an example showing how to use it.
Thanks for the reply! Weightedrandomsampler changes the weights of class or actually oversamples/ undersamples data? Will there be any difference in the two approaches?
WeightedRandomSampler
is a custom sampler, which draws the samples from your Dataset
given the specified weights and can be used to over- or undersample the dataset.
A class weighting can be enabled via the weight
argument in the criterion (e.g. nn.CrossEntropyLoss
) and would apply the class weights to the loss calculation.
Both approaches can be used to counter the effects of an imbalanced dataset, but are different in their workflow.