How to assign class weights in the loss function in BCEloss

I have a 2d tensor of the shape (batch_size, 500), 500 is no of voice frames taken at a time and each of these frames has two labels i.e. 0 and 1, 0 denoting silence while 1 denotes speech.

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 1.,  ..., 0., 0., 0.]], device='cuda:0')

After passing the input features into my BiLSTM model this is the output tensor

tensor([[0.5058, 0.5088, 0.5075,  ..., 0.5071, 0.5079, 0.5057],
        [0.5010, 0.4988, 0.4984,  ..., 0.5046, 0.5041, 0.5022],
        [0.5081, 0.5079, 0.5069,  ..., 0.4985, 0.4982, 0.4992],
        ...,
        [0.5064, 0.5104, 0.5117,  ..., 0.5039, 0.5040, 0.5041],
        [0.5049, 0.5075, 0.5079,  ..., 0.5178, 0.5174, 0.5162],
        [0.4936, 0.4948, 0.4970,  ..., 0.5033, 0.5038, 0.5041]],
       device='cuda:0', grad_fn=<SqueezeBackward0>)

Now, from the above tensor i am assigning 1 if the value is above 0.5 and 0 otherwise, so that each of the 500 frames has 1 or 0 assigned to it. After that i am calculating the BCEloss and then back propagating. But the issue is the input labels are unbalanced, so i want to assign class_weight to the labels during calculating the BCEloss.

After obtaining the class_weight from compute_class_weight module in sklearn, i am getting class_weights as array([0.59432247, 3.15048184]). But when i pass this as a tensor to the BCEloss i am getting an error.
RuntimeError: The size of tensor a (500) must match the size of tensor b (2) at non-singleton dimension 1
This is probably happening because the output tensor size is (batch_size, 500) while i have class_weights for two labels [0, 1].

Can anyone help me as to how i can assign the class_weights during training. Any kind of help would be greatly appreciated.

Thank You

Hi Amartya!

The short story: Use BCEWithLogitsLoss with pos_weight.

First, you should be using BCEWithLogitsLoss (without a final sigmoid()
in your model) for reasons of numerical stability. Let me first answer your
question in the context of BCEWithLogitsLoss.

BCEWithLogitsLoss (unlike BCELoss) takes a pos_weight constructor
argument that multiplies the positive-class contribution to the loss function.
Typically, if you had, say, ten times as many negative samples as positive
samples, your would use, approximately, pos_weight = 10.0.

For BCELoss, on the other hand, you pass in a weight constructor
argument that (for your use case) is the same shape as the output of
your model, with a separate weight value for each of its elements.
So you would build a tensor with 1.0s in the positions of the negative
samples and 10.0s in the positions of the positive samples. You would
have to build this weight tensor separately for each batch and use it
to construct a new BCELoss object for each batch.

Best.

K. Frank

Hi @KFrank thank you for your reply, Can you kindly tell me how can i create the weight tensor in case of BCEWithLogitsLoss. While using BCEloss i was calculating the class_weight as you rightly mentioned for every batch using sklearn.utils.class_weight.compute_class_weight module. Shall i do the same for BCEWithLogitsLoss.

Hi Amartya!

I don’t fully understand your use case.

In the typical use case, BCEWithLogitsLoss takes (as does BCELoss)
an input argument that is the predictions output by your model and a
target argument that is your ground-truth labels.

input and target are the same shape; for example, they could be batches
of 500-frame time series. target typically consists of the values 0.0 (which
would be the label for the “negative”-class samples) and 1.0 (the label for
the “positive” class) (but target can also contain “probabilistic” labels whose
values range from 0.0 to 1.0 and represent the probability that the item in
question belongs to the “positive” class).

In the typical use case, pos_weight is a single value (but packaged as a
tensor, e.g., of shape [1]). The exact value of pos_weight shouldn’t
matter, but you typically want it to be inversely proportional to the number
of “positive” samples.

I recommend counting the number of “positive”-class and “negative”-class
labels in your training set (or in a representative subset of your training
set) and setting pos_weight = num_negative / num_positive. The
idea is that if you don’t have very many “positive” samples in your training
set, you want to make up for it by weighting the “positive” samples more
heavily in your loss function.

(People sometimes compute pos_weight on a per-batch basis, but I think
that it’s preferable to use the same pos_weight for all the batches.)

As an aside: You posted above:

I’m not certain what you mean by this, but if you threshold the output of your
model and then pass those thresholded values to your loss function, you
won’t be able to (usefully) backpropagate through the thresholding step,
your gradients will be zero, and your model won’t train.

Best.

K. Frank

Hi @KFrank

Thanks for the swift reply, I am basically getting a distribution from the output of my model, where the last layer is a sigmoid.

tensor([[0.5058, 0.5088, 0.5075,  ..., 0.5071, 0.5079, 0.5057],
        [0.5010, 0.4988, 0.4984,  ..., 0.5046, 0.5041, 0.5022],
        [0.5081, 0.5079, 0.5069,  ..., 0.4985, 0.4982, 0.4992],
        ...,
        [0.5064, 0.5104, 0.5117,  ..., 0.5039, 0.5040, 0.5041],
        [0.5049, 0.5075, 0.5079,  ..., 0.5178, 0.5174, 0.5162],
        [0.4936, 0.4948, 0.4970,  ..., 0.5033, 0.5038, 0.5041]],
       device='cuda:0', grad_fn=<SqueezeBackward0>)

But as my original labels consists of positive (1) and negative samples (0), i wanted to calculate the accuracy between the original and predicted labels, that’s why i used a threshold to convert the distribution into 0’s and 1’s. I am not using the threshold to calculate the loss.

I am very sorry if I was unclear before.

Thank You