Loss function for binary classification

Hey all,

I am trying to utilise BCELoss with weights, but I am struggling to understand. I currently am using LSTM model to detect an event in time-series data. My output from the model and true_output are as follows[batch_size, seq_length].

Currently, I think I have managed to do hard code it but it’s not the best way to achieve this.

         loss_get = self.criterion(predictions.float(), target.float())
            # Weighted Binary Corss_entropy Function
            loss_flat = loss_get.flatten()
            target_flat = target.flatten()
            loss_flat[target_flat == 1] *= self.pos_weight_factor
            loss = loss_flat.mean()
            loss.backward()

My datasets are imbalance, meaning that I do not have a constant length of the dataset as well as there are more 0’s than 1’s, approximately 100:1, hence I need to penalise the 0’s by multiplying it with an arbitrary number. I understand that there are a few topics on this, but I cannot quite get my head around it.How to apply a weighted BCE loss to an imbalanced dataset? What will the weight tensor contain?

Therefore If I wanted to apply weights with this using the built-in function or solution has suggested by this post. https://discuss.pytorch.org/t/solved-class-weight-for-bceloss/3114?u=ykukkim

Can anyone guide me through this?

Thanks!

Hello Yong Kuk!

The most straightforward way to do this (and also better for numerical
reasons) is to adjust your network so that it outputs raw-score logits
for its predictions, rather than probabilities. (For example, if the last
layer of your network is a Sigmoid – that converts a logit to a
probability – just get rid of the Sigmoid layer.)

Then use BCEWithLogitsLoss instead of BCELoss. This is because
BCEWithLogitsLoss offers a pos_weight argument that it uses to
reweight positive samples in the loss function. In your case you would
set pos_weight to something like 100. (BCELoss does not have a
pos_weight argument – probably just an oversight, rather than for
any particular reason.)

For some further details, please take a look at this recent thread:

Good luck!

K. Frank

Hey Frank,

Thank you for your reply! Your reply has cleared some aspects. Furthermore, reading through your thread it helped me even more.

However, I am very new to machine learning, and I am slightly confused with the following terms:

multi-label, multi-class classification

Would you care to explain this for me?

Furthermore, your method seems to me that what I already have done is pretty much the same? as the sigmoid function is internally performed with BCEWithLogitsloss Have I understood correctly?

Thank you!

Hi Yong Kuk!

By way of example, in a conventional three-class (“cat,” “dog,” “bird”)
classification problem, given an image, you would say that it is an
image of exactly one of a cat or a dog or a bird. (And you wouldn’t say
it was “none of the above” unless you explicitly had a fourth, “none of
the above” class.)

In a multi-label (and in this case, three-class) classification problem
you would say that an images does or does not contain a cat, and
also does or does not contain a dog, and also does or does not contain
a bird. It can contain any combination, and it might not contain any of
the above, and it might contain all three. You can see that such a
multi-label problem is three binary problems (cat: yes or no, dog:
yes or no, bird: yes or no) run at the same time with the same network.

Yes, BCEWithLogitsLoss calculates LogSigmoid internally (in effect
calculating Sigmoid internally). This is numerically more stable than
passing your logits through Sigmoid and then passing them to
BCELoss. (Unless you have specific reason why you need to use
BCELoss – and understand it – you should always use
BCEWithLogitsLoss instead.)

Best.

K. Frank