Hi everyone,
I am working on a multilabel classification problem. There is a very high class imbalance in the training dataset, where some classes have thousands of occurrences and others have less than 10. Therefore I am trying different methods to handle the class imbalance.
Among them, I tried WeightedRandomSampler where the weights are the sqrt of the inverse class frequency (+ a regularization term). Like this I expect the rare classes to be oversampled and the common classes to be undersampled. The final size of the dataset is the same as the original.
I group the classes according to their frequency (i.e. classes that have 0-10 occurrences, 10-20 occurrences, 20-100, etc). And I look at the AUROC and F1 averaged per group to see if my method improves performance on minority classes.
Surprisingly, upweighting the minority classes yields a substantial decrease in performance globally and in partiuclar for the minority class groups. Can someone explain why this happens?
In the plot, light blue is the baseline and red and purple are different WeightedRandomSampling strategies.