Standard image classification design with CNN: The cross-entropy loss function has a weight option, which can apparently be used to compensate for class imbalance. This essentially scales up the learning rate for images of the smaller classes, right? How would this be better than including multiple copies of the images? This should on average have the same effect on the gradient without introducing “shocks” in the way of occasional long jumps in the parameter space?
Of course, you can use non-integer class weights, which does not quite fit the multiple copies approach, but has up-weighting other benefits compared to using multiple copies?