Hello! I saw a post (Dealing with imbalanced datasets in pytorch) mentioning to use weights in cross entropy loss function. I have imbalance in my dataset. My training dataset distribution is 1:1 but the testing distribution is 10:1. I put the weight in cross entropy loss as [0.9,0.1] to depict the true distribution. The issue is that the result for my class 1 is 0.51 F score and the precision is very low and recall is very high. Is there any way to fix it?

If I resample the training dataset to 10:1, I don’t think I should put any weights. Is that correct or we still need to put some weights to the smaller class.

If I resample the training dataset to 10:1, I don’t think I should put any weights. Is that correct or we still need to put some weights to the smaller class.

I think this has essentially (approximately) the same effect as weighting. Say you have made a prediction on a certain minority sample. If you then make the prediction again (due to the replication via oversampling) you apply the gradients twice, essentially. This is somewhat analogous to increasing the weight of the loss by a factor of 2 for this data point.