Hi,
I try to solve binary problem (2 labels) so I use nn.BCEWithLogitsLoss for my criterion. I tried to set weight of 500 for the positive labels (that`s also the minority class) by this way:
Its has been compiled, but I think that its not the right way, because my loss function is on the same value, and if I change it to: criterion = nn.BCEWithLogitsLoss(pos_weight=None)
I get much smaller value on the loss function
What I’m doing wrong? Thank you for your HELP!
Did you compare the accuracy of these two cases? I think when you use weights, it’s not fair to compare loss values. I think accuracy would be a better metric to compare the performance.
I think that accuracy is not the right metric for me, since my data is very imbalanced (only 3% of data is my minority class, and raw data is 200K raws). I can use other metrics, like prec, recall and f-score.
Is this way is the right way for using weights?
Thank you
@Isaac_Kargar I saw this issue but I still didnt understand how to set weights to each class. If I have 2 classes (Binary problem) so I need to set weight only to the positive label?
Thank you
As an aside, You mention in a later post that your positive class is
about 3% of your data. You might start out then by setting pos_weight
to about 30, so that (after weighting) your negative and positive classes
are about equally weighted. This won’t necessarily be the best choice
for pos_weight, but it’s a good place to start, and you should certainly
compare it with other values you try (such as 500).
This is not necessarily a problem. With pos_weight = None, your
model could easily train to predict your negative class (almost) all
of the time, and almost always be right. So you get a small loss.
When you use pos_weight to increase the weight of your
under-represented positive class, your model actually has to learn
to tell the two classes apart, so you are likely to get a larger, but
more meaningful loss.
I think this doesn’t apply to Liran’s use case. It sounds like he has a single-label, binary classification problem. So pos_weight should
have a size of 1 rather than 2.
(The terminology is a little bit confusing. When we talk about a binary
problem, we do say that there are two classes. But when we talk
about a multi-label, multi-class problem, we talk about nClass
classes. But the multi-label problem should be understood as nClassbinary problems (that share the same network) – that is,
each of the nClass labels can be absent or present. The example
you link to is a 64-class multi-label, multi-class problem. It therefore
consists of 64 binary problems, each one having its own value of pos_weight.)
Assuming your data has 25% positive pixels and 75% negative pixels. Then a good weight value will be 75/25 = 3
positive_weight = torch.ones([1])*3.0
loss = nn.BCEWithLogitsLoss(pos_weight=positive_weight)
My only confusion is do i need to have extra one dimension for class or it doesn’t matter ?
predicted_mask = [batch_size, 1, height, width] OR [batch_size, height, width]
gt_mask = [batch_size, 1, height, width] OR [batch_size, height, width]
I would call the “extra one dimension” (singleton dimension) a “channel”
dimension (rather than “class”).
Regardless of terminology, as long as the shapes of predicted_mask
and gt_mask match, both ways will work and are essentially equivalent.
If the output of the model and gt_mask (from, say, your dataloader) already
match, I wouldn’t bother doing anything. If one has the singleton dimension
and the other doesn’t, I would probably squeeze() the extra dimension away.