How to prevent overfitting of 7 class, 10000 images imbalanced class data samples?

flauted · July 22, 2018, 5:45am

https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

Assuming more data (1 in above) is out of the picture, my go-to’s for biased datasets are stratified sampling (3 in above) and weighted loss (6 in above). See (WeightedRandomSampler, forums) and (X-entropy loss weight parameter), respectively.

Weighted loss is a little easier to implement, so that’s usually where I start. Stratification is touchy. Often weighting so every class is drawn evenly doesn’t generalize well. Finding the sweet spot can be a pain, especially when you have several (or really just > 2) classes.