Problem with setting weights for BCEWithLogitsLoss-binary problem

I try to solve binary problem (2 labels) so I use nn.BCEWithLogitsLoss for my criterion. I tried to set weight of 500 for the positive labels (that`s also the minority class) by this way:

torch_weights = torch.ones([1]).cuda()
torch_weights = torch_weights*500
criterion = nn.BCEWithLogitsLoss(pos_weight=torch_weights )

Its has been compiled, but I think that its not the right way, because my loss function is on the same value, and if I change it to:
criterion = nn.BCEWithLogitsLoss(pos_weight=None)

I get much smaller value on the loss function
What I’m doing wrong? Thank you for your HELP!

Did you compare the accuracy of these two cases? I think when you use weights, it’s not fair to compare loss values. I think accuracy would be a better metric to compare the performance.

I think that accuracy is not the right metric for me, since my data is very imbalanced (only 3% of data is my minority class, and raw data is 200K raws). I can use other metrics, like prec, recall and f-score.
Is this way is the right way for using weights?
Thank you

Yes I think it is better than loss.

Ok, so is this code is ok?

torch_weights = torch.ones([1]).cuda()
torch_weights = torch_weights*500
criterion = nn.BCEWithLogitsLoss(pos_weight=torch_weights )

Here you set the weights to 500 and you have one class only? Shouldn’t it be 2? Check this please.

@Isaac_Kargar I saw this issue but I still didnt understand how to set weights to each class. If I have 2 classes (Binary problem) so I need to set weight only to the positive label?
Thank you

Hi Liran and Isaac!

As an aside, You mention in a later post that your positive class is
about 3% of your data. You might start out then by setting pos_weight
to about 30, so that (after weighting) your negative and positive classes
are about equally weighted. This won’t necessarily be the best choice
for pos_weight, but it’s a good place to start, and you should certainly
compare it with other values you try (such as 500).

This is not necessarily a problem. With pos_weight = None, your
model could easily train to predict your negative class (almost) all
of the time, and almost always be right. So you get a small loss.

When you use pos_weight to increase the weight of your
under-represented positive class, your model actually has to learn
to tell the two classes apart, so you are likely to get a larger, but
more meaningful loss.

I think this doesn’t apply to Liran’s use case. It sounds like he has a
single-label, binary classification problem. So pos_weight should
have a size of 1 rather than 2.

(The terminology is a little bit confusing. When we talk about a binary
problem, we do say that there are two classes. But when we talk
about a multi-label, multi-class problem, we talk about nClass
classes. But the multi-label problem should be understood as
nClass binary problems (that share the same network) – that is,
each of the nClass labels can be absent or present. The example
you link to is a 64-class multi-label, multi-class problem. It therefore
consists of 64 binary problems, each one having its own value of


K. Frank

1 Like

So for binary image classification this is how it should be done i feel. Correct me if i am wrong.

Input= [batch_size, channels, height,width]
predicted_mask = [batch_size, 1, height, width]
gt_mask = [batch_size, 1, height, width]

Assuming your data has 25% positive pixels and 75% negative pixels. Then a good weight value will be 75/25 = 3

positive_weight = torch.ones([1])*3.0
loss = nn.BCEWithLogitsLoss(pos_weight=positive_weight)

My only confusion is do i need to have extra one dimension for class or it doesn’t matter ?
predicted_mask = [batch_size, 1, height, width] OR [batch_size, height, width]
gt_mask = [batch_size, 1, height, width] OR [batch_size, height, width]

Hi Nikhil!

This is fine.

This is fine.

I would call the “extra one dimension” (singleton dimension) a “channel”
dimension (rather than “class”).

Regardless of terminology, as long as the shapes of predicted_mask
and gt_mask match, both ways will work and are essentially equivalent.

If the output of the model and gt_mask (from, say, your dataloader) already
match, I wouldn’t bother doing anything. If one has the singleton dimension
and the other doesn’t, I would probably squeeze() the extra dimension away.


K. Frank