How to oversample the imbalanced classes in BoW

Suppose my task is to build a BoW predictor from image features. Inputs are image features and outputs are BoW vectors. Each element of the BoW are cat, dog, pig, panda and penguin. The BoW in the training data is imbalanced. For example, [1, 0, 0, 0, 0], [1, 0, 1, 0, 0],[1, 1, 1, 0, 0], [1, 0, 0, 0, 1],[1, 0, 1, 1, 0]…

My question is how I should oversample the data? In the example above, panda is the minority class. If I try to increase frequencies of panda by oversampling, cat and pig also increase as well.
Should I use torch.utils.data.distributed. DistributedSampler?

This post has some suggestions for different sampling implementation when dealing with an imbalanced multi-label dataset. :slight_smile:

1 Like