Unclear about Weighted BCE Loss

string111 · July 21, 2018, 4:30pm

Hey there super people!
I am having issues understanding the BCELoss weight parameter. I am having a binary classification issue, I have an RNN which for each time step over a sequence produces a binary classification. Precisely, it produces an output of size (batch, sequence_len) where each element is in range 0 - 1 (confidence score of how likely an event happened at one time-step).
Because the ground truth is imbalanced (more 0s than 1s), I would like to make use of BCELoss weight parameter, but I clearly do not understand the the DOCs well enough…

What is the size of this parameter and which elements does it take?

ptrblck · July 21, 2018, 8:00pm

If you would like to use the weight as a class weight, I assume you have a binary target, i.e. it contains only zeros and ones.
In that case I would create a weight tensor and just multiply it with your unreduces loss. Here is a small example:

weight = torch.tensor([0.1, 0.9])
weight_ = weight[y.data.view(-1).long()].view_as(y)
criterion = nn.BCELoss(reduce=False)
loss = criterion(output, y)
loss_class_weighted = loss * weight_
loss_class_weighted = loss_class_weighted.mean()

Let me know, if that works for you.

string111 · July 30, 2018, 6:04am

Didn’t quite work that well, but I just sampled batches with 1:1 ratio of background:activity labels whch turned out to be a neat solution!

ptrblck · July 30, 2018, 6:30am

Good to hear.
Maybe the new pos_weight argument for nn.BCEwithLogitsLoss might work better, since there is a difference in my weighting approach and the implemented one. See this discussion.

string111 · August 1, 2018, 2:17pm

Do I understand it correctly, that if I have a binary model-output of size (N, 1) the pos_weight argument should be of size (1)? If I have a class distribution of let’s say 900:100 (900 zeros in my ground truth to 100 ones) then pos_weight = torch.Tensor([(900 / 100)])? I guess all zeros are negative examples and the ones are positives, that’s also confusing me

ptrblck · August 2, 2018, 7:48am

That’s also how I understand the docs. Did you make any progress using it?

string111 · August 2, 2018, 8:07am

Training only got two 1080Ti GPUs for ActivityNet…test looked promising though.

string111 · August 2, 2018, 2:51pm

Alright, tried it . My confusion matrix starts with only having FNs and TNs and the model never predicts the class I want it to (still always outputs tensors of 0s)

EDIT: The softmax output seems to be always a tensor of (0.0325, …, 0.0325), already after the first batch

ptrblck · August 2, 2018, 6:49pm

Could your try to play around with your learning rate a bit, i.e. increasing and lowering it?
It smells like your training got stuck.

xploiter-projects · January 31, 2020, 7:00am

I have pretty much same problem. I have class imbalance of 8:1,
I tried using BCE Loss with different weights i.e.

[0.75, 1.25] seems working a bit well but got stuck at the end.
[0.5, 2.5] model diverged sooner.
(Optimizer ASGD with LR 0.01)

I used ADAM first, but it didn’t behave nicely so I switched to ASGD.

Is it necessary to have the sum of weights == 1? (if yes, may you please explain why?)

this is my loss function:

def weighted_binary_cross_entropy(output, target, weights=None):
    output = torch.clamp(output,min=1e-8,max=1-1e-8)

    if weights is not None:
        assert len(weights) == 2
        loss = weights[1] * (target * torch.log(output)) + \
               weights[0] * ((1 - target) * torch.log(1 - output))
    else:
        loss = target * torch.log(output) + (1 - target) * torch.log(1 - output)

    return torch.neg(torch.mean(loss))

Rachel_Copperman · February 24, 2020, 2:33pm

why did u use the torch.clamp for?

And do u know what numbers are good if my data is 1:100,000 ?

xploiter-projects · March 2, 2020, 11:34am

@Rachel_Copperman
torch.clamp is used for numerical stability here since output can be “hard one or zero” and torch.log(0) becomes -infinity.

yong_xu · April 8, 2020, 11:59pm

@ptrblck, i just wonder how to apply the weight in BCELoss in real training scenario. That is, the dataset is splitted into training, validation and testing subset. What is more, during model training, there are some epoches and an epoch is comprised of one or more batches. So the weight can be calculated in dataset level, training subset level or even batch level. How to choose? Note that, in batch level, the number of positive or negative in each class may be zero, thus, the following will be occurred: pos_weight = torch.Tensor([(negative examples / 0)]).
Another question is that Adamax and BCELoss is used but my F1 score is just 0.04, how to tune the performance?

ptrblck · April 9, 2020, 4:50am

You would usually calculate the pos_weight using the complete training dataset and pass it to the instantiation of the criterion, so that a division by 0 should not occur.