# Loss function for binary classification

Hey all,

I am trying to utilise BCELoss with weights, but I am struggling to understand. I currently am using LSTM model to detect an event in time-series data. My output from the model and true_output are as follows`[batch_size, seq_length]`.

Currently, I think I have managed to do hard code it but it’s not the best way to achieve this.

``````         loss_get = self.criterion(predictions.float(), target.float())
loss_flat = loss_get.flatten()
target_flat = target.flatten()
loss_flat[target_flat == 1] *= self.pos_weight_factor
loss = loss_flat.mean()
loss.backward()
``````

My datasets are imbalance, meaning that I do not have a constant length of the dataset as well as there are more 0’s than 1’s, approximately 100:1, hence I need to penalise the 0’s by multiplying it with an arbitrary number. I understand that there are a few topics on this, but I cannot quite get my head around it.How to apply a weighted BCE loss to an imbalanced dataset? What will the weight tensor contain?

Therefore If I wanted to apply weights with this using the built-in function or solution has suggested by this post. https://discuss.pytorch.org/t/solved-class-weight-for-bceloss/3114?u=ykukkim

Can anyone guide me through this?

Thanks!

Hello Yong Kuk!

The most straightforward way to do this (and also better for numerical
reasons) is to adjust your network so that it outputs raw-score logits
for its predictions, rather than probabilities. (For example, if the last
layer of your network is a `Sigmoid` – that converts a logit to a
probability – just get rid of the `Sigmoid` layer.)

Then use `BCEWithLogitsLoss` instead of `BCELoss`. This is because
`BCEWithLogitsLoss` offers a `pos_weight` argument that it uses to
reweight positive samples in the loss function. In your case you would
set `pos_weight` to something like 100. (`BCELoss` does not have a
`pos_weight` argument – probably just an oversight, rather than for
any particular reason.)

For some further details, please take a look at this recent thread:

Good luck!

K. Frank

Hey Frank,

However, I am very new to machine learning, and I am slightly confused with the following terms:

multi-label, multi-class classification

Would you care to explain this for me?

Furthermore, your method seems to me that what I already have done is pretty much the same? as the` sigmoid` function is internally performed with `BCEWithLogitsloss` Have I understood correctly?

Thank you!

Hi Yong Kuk!

By way of example, in a conventional three-class (“cat,” “dog,” “bird”)
classification problem, given an image, you would say that it is an
image of exactly one of a cat or a dog or a bird. (And you wouldn’t say
it was “none of the above” unless you explicitly had a fourth, “none of
the above” class.)

In a multi-label (and in this case, three-class) classification problem
you would say that an images does or does not contain a cat, and
also does or does not contain a dog, and also does or does not contain
a bird. It can contain any combination, and it might not contain any of
the above, and it might contain all three. You can see that such a
multi-label problem is three binary problems (cat: yes or no, dog:
yes or no, bird: yes or no) run at the same time with the same network.

Yes, `BCEWithLogitsLoss` calculates `LogSigmoid` internally (in effect
calculating `Sigmoid` internally). This is numerically more stable than
passing your logits through `Sigmoid` and then passing them to
`BCELoss`. (Unless you have specific reason why you need to use
`BCELoss` – and understand it – you should always use
`BCEWithLogitsLoss` instead.)

Best.

K. Frank