Assuming more data (1 in above) is out of the picture, my go-to’s for biased datasets are stratified sampling (3 in above) and weighted loss (6 in above). See (WeightedRandomSampler, forums) and (X-entropy loss weight
parameter), respectively.
Weighted loss is a little easier to implement, so that’s usually where I start. Stratification is touchy. Often weighting so every class is drawn evenly doesn’t generalize well. Finding the sweet spot can be a pain, especially when you have several (or really just > 2) classes.