# How to interpret the probability of classes in binary classification?

Hi,
I’m working on a binary classification problem with `BCEWithLogitsLoss`. My classes are just 0 and 1, such that my output is just single number. During testing, I would like to get the probabilities for each class. After running the test set through the model, I pass the outputed values through `torch.sigmoid` to get the probabilities. What I would like to know is, what that number signifies.

For example, if one of the output probability is 0.2. Does this mean that there is 0.2 output that this example belongs to class 0 or class 1?

Thanks.

Hi Shaun!

In short, the output of `sigmoid()` is the (predicted) probability
P(class 1) = 0.2 (and P(class 2) = 0.8).

This can be found (a bit opaquely) in the BCEWithLogitsLoss
documentation, but you have to kind of parse through the
equations to get this information.

(As an aside, and as you are probably aware, the `sigmoid`
function is, in effect, built into `BCEWithLogitsLoss`.)

Good luck.

K. Frank

So in effect P(class_label == 0) = 0.2 and P(class_label == 1) = 0.8?

Yes, I am aware of this. However, when predicting on the test set, I don’t use the loss function. I just pass the test set to the model and get the output which in this case are not probabilities and are just output values of the model. So I apply a `sigmoid` to the output to convert the values to the probabilities.

Essentially, after model training during final testing, I run this piece of code:

``````classifier = (...)
y_pred = classifier(x_test)
probs = torch.sigmoid(y_pred)
predicted_vals = probs > threshold
``````

Given this code, my assumption is that `predicted_vals` will contain the predictions of the class labels by the model which I can then compare to `y_test` to get the final performance of the model. Am I correct in my assumption?

1 Like

It is the probability that belongs to class 1. In other words, the network outputs p(t=1|x). This observation can be easily derived from applying Bayes Theorem in a binary classification problem. You can see that in the book Pattern Recognition from Duda and Hart of Pattern Recognition for Machine Learning from C.M Bishop

I’m confused. Are you saying the same thing as the other person? When you say class 1 are you referring to the class with class label 0 or class label 1?

Yes the class labelled with a 1.

Hi Shaun!

Yes, this is correct. To give a little more detail, `y_pred` is a
number that indicates how strongly (in some units) the model
is predicting class = “1”. This number is typically called the
logit. `probs = torch.sigmoid(y_pred)` is the predicted
probability that class = “1”. And `predicted_vals` is the
predicted class label itself (0 or 1).

As a practical matter, you don’t need to calculate `sigmoid`.
You can save a little bit of time (but probably trivial) by
leaving it out.

If threshold were 0.5 (that is, predict class = “1” when
P(class = “1”) > 1/2), then you could use
`predicted_vals = y_pred > 0`.

More generally, you can compare `y_pred` with the
inverse-sigmoid of the threshold you want. This is typically
called the logit function, and is given by `log (p / (1 - p)`.

(Given a probability value p, 0 < p < 1, inverse-sigmoid (p) =
logit (p) = log (p / (1 - p)).)

So:

``````logit_threshold = torch.tensor (threshold / (1 - threshold)).log()
...
predicted_vals = y_pred > logit_threshold
``````

That is, instead of applying `sigmoid` to all of your `y_pred`s,
you calculate `inverse-sigmoid` of `threshold` once.

Best.

K. Frank

2 Likes

Thank you for that, it is really helpful.

One point of confusion is when you say class “1”, which class are you actually referring to?

In my problem, I have the following class: non-imminent and imminent which have the labels as 0 and 1 respectively. In other words, non-imminent (class label 0) is the negative class and imminent (class label 1) is the positive class. Furthermore, this is an imbalanced dataset problem with the positive class (imminent, class label 1) having a prevalence of 0.2.

When you say class “1”, would that in this case refer to the imminent (class label 1) positive class? That is the assumption I’ve been working under.

Basically, I’m assuming that the model (after passing the output through a `sigmoid`) gives me the probability of an imminent (class label 1) positive class. I then define a threshold based on my metric which says that if the probability of the positive class (imminent, class label 1) is greater than the threshold, give the prediction a class label of 1. Else, give it a class label of 0. Is this correct?

Sorry for drilling this on, but this basically determines how I evaluate my model and how it performs and I want to make sure I do it correctly.

Thank you.

Hey @shaun, AFAIK, it’s exactly what you are saying, the result from the sigmoid for the test set is the probability of the positive class, imminent, class label 1, or even P(class = imminent), as the very good explanation from @KFrank points out. And since you have an imbalanced data set, you can work with the “weights” attribute from the BCEWithLogitsLoss, to give different weights for classes with fewer examples

Hello Shaun!

In short, “class ‘1’” means whatever you trained you model for it
to mean.

To explain this, let me go back to one of your earlier posts:

You talk about `x_test` and `y_test` (and `y_pred`). I assume
that `y_test` is a vector of length the number of test samples,
with each value in `y_test` being the number 0, meaning that
this sample is in class “0”, or the number 1, meaning class “1”.
(And `x_test` is the input data whose classes you are trying
to predict.)

You don’t mention it, but I assume you also have an `x_train`
and `y_train` whose meanings are analogous, and that you
used `x_train` and `y_train` to train your network. The point
is that the meaning of the output of your model (whether a
given value for `y_pred = classifier(x_test)` means
class “0” or class “1”) depends on how you trained your model.

If you train your model with values of `y_train` of 1 indicating
class “1” (and feed it into `BCEWithLogitsLoss`) then larger
(more positive) values of `y_pred` will mean that you are
predicting class “1” to be more likely and smaller (more negative)
values of `y_pred` predict class “0” to be more likely (and
class “1” to be less likely), with the predicted probability of
class “'1” given by the `sigmoid` of `y_pred`, as discussed in
the earlier posts.

To summarize, the meaning of `y_pred` depends in a
straightforward way on the meaning of the `y_train` that you

Best.

K. Frank

1 Like

Do you have any reference to:

but you have to kind of parse through equations to get this information

??

Hi Paulo!

Yes, but you have to read between the lines a little bit.

In short, for the binary (two-class) classification problem, if
you use 0 and 1 as the class labels, such class labels can
be understood as probabilities.

This is at least implicit in the BCEWithLogitsLoss documentation,
where we have the equation (formatted more nicely in the link):

ℓ(x,y)=L={l1​,…,lN​}⊤,ln​=−wn​[yn​⋅logσ(xn​)+(1−yn​)⋅log(1−σ(xn​))]

We have two classes. Understanding yn to be the given,
known probability of one of the two classes, and therefore
1−yn to be the given probability of the other, we recognize
−​[yn​⋅logσ(xn​)+(1−yn​)⋅log(1−σ(xn​))] to be the cross-entropy
for σ(xn​) to be the predicted probability for one class (and
therefore (1−σ(xn​)) to be the predicted probability of the other).
(xn is the logit for that class predicted by your model and
σ(xn​) = sigmoid(xn​) is the predicted probability for that class.)

So this equation is telling us how we must interpret yn.

In the usual case for labelled training data, yn is the class
label and is therefore equal to 0 or 1. But this agrees
with understanding yn as a probability: yn = 0 means
0% probability of being in class “1” which means 100%
probability of being in class “0”. And yn = 1 means 100%
probability of being in class “1”.

To repeat this with slightly different wording:

yn = Prob (class “1”) = 0 --> class “0”, and
yn = Prob (class “1”) = 1 --> class “1”.

(Later in the documentation, the yn are referred to as “the
targets t[i],” a change of notation that doesn’t help matters.)

It would be much more understandable if the documentation
made clear that this is how the class labels enter into
the loss function and gave a concrete example. But drilling
down into the equation for the loss function does tell us
what the class labels have to mean.

I hope that this is what you were looking for and explains
what I was referring to in my earlier post.

Best.

K. Frank

4 Likes

Thanks, Frank. It helped a lot! hello sir. I’m using sigmoid activation fee prediction. I have 30 images in test set. Output will be in the form of
Image. Prediction. Label

1. ``````          Tensor([0.94,0.123])        tensor([1.,0.])
``````
2. ``````          Tensor([0.12,0.76])        tensor([0.,1.])
``````
3. ``````          Tensor([0.96,0.153])        tensor([1.,0.])
``````
4. ``````          Tensor([0.94,0.23])        tensor([1.,0.])
``````

And so on for 30 images.
Can you please tell me about labels and how to predict single probability?