Hi,
I’m working on a binary text classification problem. I have a pretty imbalanced dataset. The statistics are shown below.
train | positive | 9598 |
negative | 30988 | |
val | positive | 1200 |
negative | 3874 | |
test | positive | 1200 |
negative | 3874 |
When I searched online and the forums, I came across a couple of methods to deal with this. I have a couple of questions about them. For this problem, I am using the BCEWithLogitsLoss which has a weight
parameter and pos_weight
parameter.
- What is the difference between
weight
andpos_weight
and which one should I use for this problem? - For
pos_weight
the example tells to pass a ratio between the sizes of the positive and negative class. How will the function know which is the positive and the negative class when using the weight? - What do I pass for the
weight
parameter? - Will using PyTorch’s WeightedRandomSampler help in this case? Also, I don’t understand the example given for this. Again, what do I pass for the parameters?
Please let me know if additional information is required and thanks for the help.