Hi,

I’m working on a binary text classification problem. I have a pretty imbalanced dataset. The statistics are shown below.

train | positive | 9598 |

negative | 30988 | |

val | positive | 1200 |

negative | 3874 | |

test | positive | 1200 |

negative | 3874 |

When I searched online and the forums, I came across a couple of methods to deal with this. I have a couple of questions about them. For this problem, I am using the BCEWithLogitsLoss which has a `weight`

parameter and `pos_weight`

parameter.

- What is the difference between
`weight`

and`pos_weight`

and which one should I use for this problem? - For
`pos_weight`

the example tells to pass a ratio between the sizes of the positive and negative class. How will the function know which is the positive and the negative class when using the weight? - What do I pass for the
`weight`

parameter? - Will using PyTorch’s WeightedRandomSampler help in this case? Also, I don’t understand the example given for this. Again, what do I pass for the parameters?

Please let me know if additional information is required and thanks for the help.