I am working on the model that will be trained on 4 channels medical data. I noticed that labels of the images are not distributed fairly. In order to fetch the importance of the low-weighted class more, I am planning to augment the train data. But if i apply generic image augmentation, it might be no effect because the augmentation will effect the same on each class. I need to augment especially low weighted class so that the model would see them a lot. I can tune it manually like low classes will be augmented 5 times while high class will be 2 times. But it is too risky for the model to be manipulated.
Are there any methods or advices that you have or you like using? It would be very appreciated for me.
I don’t see anything wrong with augmenting classes that appear less
frequently more heavily than classes that appear more frequently.
But my guess is that you would be a little bit better off using a WeightedRandomSampler to sample the less-frequent classes more
heavily (and if you do augment your training data, don’t augment any
differently based on class).
(You could also use the weight constructor argument for CrossEntropyLoss, if this is a multi-class classification problem,
or the conceptually similar pos_weight constructor argument for BCEWithLogitsLoss, if this is a binary classification problem. But
I would prefer the WeightedRandomSampler approach unless it is
likely that any given batch would have duplicate samples in it.)