Hi everyone,

I am working on a multilabel classification problem. There is a very high class imbalance in the training dataset, where some classes have thousands of occurrences and others have less than 10. Therefore I am trying different methods to handle the class imbalance.

Among them, I tried WeightedRandomSampler where the weights are the sqrt of the inverse class frequency (+ a regularization term). Like this I expect the rare classes to be oversampled and the common classes to be undersampled. The final size of the dataset is the same as the original.

I group the classes according to their frequency (i.e. classes that have 0-10 occurrences, 10-20 occurrences, 20-100, etc). And I look at the AUROC and F1 averaged per group to see if my method improves performance on minority classes.

**Surprisingly, upweighting the minority classes yields a substantial decrease in performance globally and in partiuclar for the minority class groups. Can someone explain why this happens?**

In the plot, light blue is the baseline and red and purple are different WeightedRandomSampling strategies.