I think that accuracy is not the right metric for me, since my data is very imbalanced (only 3% of data is my minority class, and raw data is 200K raws). I can use other metrics, like prec, recall and f-score.
Is this way is the right way for using weights?
As an aside, You mention in a later post that your positive class is
about 3% of your data. You might start out then by setting pos_weight
to about 30, so that (after weighting) your negative and positive classes
are about equally weighted. This won’t necessarily be the best choice
for pos_weight, but it’s a good place to start, and you should certainly
compare it with other values you try (such as 500).
This is not necessarily a problem. With pos_weight = None, your
model could easily train to predict your negative class (almost) all
of the time, and almost always be right. So you get a small loss.
When you use pos_weight to increase the weight of your
under-represented positive class, your model actually has to learn
to tell the two classes apart, so you are likely to get a larger, but
more meaningful loss.
I think this doesn’t apply to Liran’s use case. It sounds like he has a single-label, binary classification problem. So pos_weight should
have a size of 1 rather than 2.
(The terminology is a little bit confusing. When we talk about a binary
problem, we do say that there are two classes. But when we talk
about a multi-label, multi-class problem, we talk about nClass
classes. But the multi-label problem should be understood as nClassbinary problems (that share the same network) – that is,
each of the nClass labels can be absent or present. The example
you link to is a 64-class multi-label, multi-class problem. It therefore
consists of 64 binary problems, each one having its own value of pos_weight.)
Assuming your data has 25% positive pixels and 75% negative pixels. Then a good weight value will be 75/25 = 3
positive_weight = torch.ones()*3.0
loss = nn.BCEWithLogitsLoss(pos_weight=positive_weight)
My only confusion is do i need to have extra one dimension for class or it doesn’t matter ?
predicted_mask = [batch_size, 1, height, width] OR [batch_size, height, width]
gt_mask = [batch_size, 1, height, width] OR [batch_size, height, width]
I would call the “extra one dimension” (singleton dimension) a “channel”
dimension (rather than “class”).
Regardless of terminology, as long as the shapes of predicted_mask
and gt_mask match, both ways will work and are essentially equivalent.
If the output of the model and gt_mask (from, say, your dataloader) already
match, I wouldn’t bother doing anything. If one has the singleton dimension
and the other doesn’t, I would probably squeeze() the extra dimension away.