Using BCELoss() with real-valued labels without any correspondance to a class


I start saying that I know the BCELoss is generally exploited when there’s a classification problem, but I’m also quite new with ML.
I’m trying to implement a system that is explained in a paper, in which it is said they built a NN whose output layer is formed by a single neuron with Sigmoid activation function. Consequently, the output of the NN is a number comprised between zero and one. Based on the comparison of this latter and a threshold, they decide to accept or refuse a given parameter’s value (which was given as input to the NN). And that’s why they call this NN classifier, even if its output is a real value which doesn’t correspond to any class. Moreover, it is written that the loss function they use is the Binary Cross Entropy.

Summarizing, I should train a NN so that it outputs a certain value between 0 and 1 using the BCELoss. So the label for each input will be a real quantity - 0.02, 0.96, … .

So, what I would like to ask is: is it really possible to use the BCELoss function in such a case or maybe I’m missing something? I’m pretty sure to have well understood the paper, but maybe I should train the NN differently.

[ I say sorry if I do not report any code and/or if the type of question is anomalous, but I’m new also for these kind of forums. If there are problems with my topic, I will delete it asap ]

Hi MngFrc!

Yes, this is a conventional binary-classification problem. The
input value for the parameter is either accepted – class-“1”,
“yes” – or refused – class-“0”, “no”. That is, the input has been
classified into one of two classes.

This is standard. The output of the classifier – a continuous
number between 0 and 1 – is understood as the predicted
probability of the input being in class-“1”. This is good for a
number of reasons, not least of which is that it is the result of
a differentiable computation and can be fed into a differentiable
loss function, so the model can be trained with back-propagation.

Yes. The key is that the continuous output of the model is to be
understood as the predicted probability of the input being in
class-“1”. And this is exactly what BCELoss expects.

(As an aside, for reasons of numerical stability, you’ll be better
off removing the Sigmoid and using BCEWithLogitsLoss.
This is mathematically – but not numerically – equivalent.)


K. Frank

1 Like

Hi @KFrank!

First of all, thank you for your reply.
The thing that was not allowing me to interprete my problem as a classification one was that my threshold is not 0.5, but definitely smaller (let’s suppose 0.1)
But thanks to your words I have understand the right point of view.

Can I ask you how, since my classes are unbalanced, can I train my NN? When I have a label equal to 0.9 for example - or any value greater than the threshold - should I round it to 1? Or should I differently weight somehow the 2 terms belonging to the BCELoss?

The fact is that if I have a label equal to 0.5, the BCE Loss will not decrease to zero by its definition of I leave the label unchanged. But also, I think that I should differentiate a label equal to 0.5 with respect to one equal to 0.9

Hi MngFrc!

Where does this threshold come from? Is it somehow given as
part of the problem, or is it something you choose? In the typical
case, you won’t be using a threshold while training your model.

The term unbalanced generally means that you have many
more examples of one class than another. Two approaches
are to sample the less common classes more frequently when
you train, or to weight the less common classes more heavily
in your loss function. (See, for example, the weight argument
passed to the constructor of BCELoss.)

But I’m not sure that this is what you mean by “unbalanced.”

You talk about having a label of “0.9”. Therefore I understand
that your samples are not labeled with integer class labels (for
which 0 would mean that the sample is in class-“0” and 1
would mean that the class is in class-“1”), but rather, that your
labels are continuous values between 0 and 1 and represent
the probability that your sample is in class-“1”.

This is fine. You want to train your model to predict the values
of the probabilities given by your labels. Cross-entropy is a
measure of how dissimilar two probability distributions are, so
BCELoss is fully appropriate for this use case.

This is true, but it’s not a problem. The pytorch optimizers don’t
care whether your loss is zero – they just drive the loss (using
some flavor of gradient descent) to lower values (and if you
run them long enough, perhaps to its minimum value).

If your labelled probability is 0.5, then BCELoss takes on its
minimum value when your predicted probability is also 0.5.

You should. And BCELoss does this. If your label is 0.5,
BCELoss will prefer a predicted probability of 0.5. And if your
label is 0.9, BCELoss will prefer a predicted probability of 0.9.

In summary – if I understand what you are doing – you are
working with a binary classification problem, but with the
(perfectly reasonable) feature that your labels are continuous
probabilities between 0 and 1, rather than being integer class
labels equal to either 0 or 1.

BCELoss is just fine for this use case, and there’s no need to
use any kind of threshold in your training.

(But to repeat my earlier comment, you’ll be better off using the
mathematically-equivalent BCEWithLogitsLoss.)

Good luck.

K. Frank

Hi @KFrank !

Again, thank you for your time. I really appreciate it.
I have understood that I made a bit of confusion, so sorry for that. I will try to summarize what my problem consist on.

I am looking for a NN that, given an input that represents the parameters of a system, outputs its error probability. The final aim is to look for the right parameter tuning through an iterative way: I start with an input, and if its error probability is above a threshold, I reduce the input until I find a a satisfactory error probability [below the threshold].

So my dataset consists on a set of measurements which couple system parameters and error probabilities.

That is why I have said that my labels can not really be seed as classes.

I have know 2 possible approaches (I guess) :

  1. I leave the error probabilities as they are and I use them as labels
  2. I round the measured error probabilities in the dataset in such a way that, if they are above the threshold they are substituted with 1, with 0 in the other case.

The only constraint that I have is to use the BCELoss() function.

From this I guess that the first option should be the preferable one (Please correct me if I’m wrong).
[I tried this approach and I have obtained quite large losses, but I do not exlude implementation problems. Anyway that was the reason I opened this topic. I thought that BCELoss() was fine only in the case I have {0,1} classes (and so labels)]

Hello MngFrc!

It’s not at all clear to me what you are trying to do, so it’s hard
to judge whether your approach makes sense.

It might be a good idea for you to start a new thread.

If you do, could you make concrete what data you’re starting
with, and what the numbers mean? What is the shape of the
input to your network? Roughly – in words – what is the
architecture of your network? Just a few Linear layers?
Lots of convolutions and poolings and dropouts, etc.?

You say that you have “input that represents the parameters of a
system,” and that you want “to look for the right parameter tuning
through an iterative way.” Does this mean that you want to tune
your input parameters, rather than the parameters (e.g., weights
and biases) in neural network?

You say “I reduce the input until I find a a satisfactory error
probability.” Does that mean that you simply make the values
of your input parameters smaller? Or do you adjust your input
parameters in some more complicated way? Are you hoping to
use a pytorch optimizer to adjust your input parameters for you?

(Again, these questions aren’t specifically about BCELoss, so
it would make sense to start a new thread.)


K. Frank

Ok, I will do!

Being new to ML let me be not so clear.

Anyway yes, I want to tune my imput parameters.

And yes, I simply make the values of my input parameters smaller, without any advanced technique.

Thank you againg, best wishes!