# Unclear about Weighted BCE Loss

Hey there super people!
I am having issues understanding the `BCELoss` `weight` parameter. I am having a binary classification issue, I have an RNN which for each time step over a sequence produces a binary classification. Precisely, it produces an output of size `(batch, sequence_len)` where each element is in range `0 - 1` (confidence score of how likely an event happened at one time-step).
Because the ground truth is imbalanced (more 0s than 1s), I would like to make use of `BCELoss` `weight` parameter, but I clearly do not understand the the DOCs well enoughâ€¦

What is the size of this parameter and which elements does it take?

4 Likes

If you would like to use the weight as a class weight, I assume you have a binary target, i.e. it contains only zeros and ones.
In that case I would create a weight tensor and just multiply it with your unreduces loss. Here is a small example:

``````weight = torch.tensor([0.1, 0.9])
weight_ = weight[y.data.view(-1).long()].view_as(y)
criterion = nn.BCELoss(reduce=False)
loss = criterion(output, y)
loss_class_weighted = loss * weight_
loss_class_weighted = loss_class_weighted.mean()
``````

Let me know, if that works for you.

8 Likes

Didnâ€™t quite work that well, but I just sampled batches with 1:1 ratio of background:activity labels whch turned out to be a neat solution!

Good to hear.
Maybe the new `pos_weight` argument for nn.BCEwithLogitsLoss might work better, since there is a difference in my weighting approach and the implemented one. See this discussion.

2 Likes

Do I understand it correctly, that if I have a binary model-output of size `(N, 1)` the `pos_weight` argument should be of size `(1)`? If I have a class distribution of letâ€™s say 900:100 (900 zeros in my ground truth to 100 ones) then `pos_weight = torch.Tensor([(900 / 100)])`? I guess all zeros are negative examples and the ones are positives, thatâ€™s also confusing me

Thatâ€™s also how I understand the docs. Did you make any progress using it?

Training only got two 1080Ti GPUs for ActivityNetâ€¦test looked promising though.

Alright, tried it . My confusion matrix starts with only having FNs and TNs and the model never predicts the class I want it to (still always outputs tensors of 0s)

EDIT: The softmax output seems to be always a tensor of (0.0325, â€¦, 0.0325), already after the first batch

Could your try to play around with your learning rate a bit, i.e. increasing and lowering it?
It smells like your training got stuck.

I have pretty much same problem. I have class imbalance of 8:1,
I tried using BCE Loss with different weights i.e.

1. [0.75, 1.25] seems working a bit well but got stuck at the end.
2. [0.5, 2.5] model diverged sooner.
(Optimizer ASGD with LR 0.01)

I used ADAM first, but it didnâ€™t behave nicely so I switched to ASGD.

Is it necessary to have the sum of weights == 1? (if yes, may you please explain why?)

this is my loss function:

``````def weighted_binary_cross_entropy(output, target, weights=None):
output = torch.clamp(output,min=1e-8,max=1-1e-8)

if weights is not None:
assert len(weights) == 2
loss = weights[1] * (target * torch.log(output)) + \
weights[0] * ((1 - target) * torch.log(1 - output))
else:
loss = target * torch.log(output) + (1 - target) * torch.log(1 - output)

You would usually calculate the `pos_weight` using the complete training dataset and pass it to the instantiation of the criterion, so that a division by 0 should not occur.