Should class-weights be used to compensate for imbalance?

Standard image classification design with CNN: The cross-entropy loss function has a weight option, which can apparently be used to compensate for class imbalance. This essentially scales up the learning rate for images of the smaller classes, right? How would this be better than including multiple copies of the images? This should on average have the same effect on the gradient without introducing “shocks” in the way of occasional long jumps in the parameter space?

Of course, you can use non-integer class weights, which does not quite fit the multiple copies approach, but has up-weighting other benefits compared to using multiple copies?

The loss will be relatively scaled for each class and the corresponding weight.
I.e. in the default setup (calculating the mean of the batch loss), you would normalize the final loss with the used weights, so that the overall gradient doesn’t explode in case you sample a batch full of samples with a high weight.

From my experience, I got better results from oversampling the minority class, but I wouldn’t generalize it and others might prefer a weighted criterion.

1 Like