# Why is BECLossWithLogits compute different value from CrossEntropyLoss

Hi,

I am trying the `nn.BCELossWithLogits` now, and this is my code:

``````    logits = torch.randn(1, 2, 4, 4)
label = torch.randint(0, 2, (1, 4, 4))

criteria_ce = nn.CrossEntropyLoss()
loss = criteria_ce(logits, label)
print(loss)

criteria_bce = nn.BCEWithLogitsLoss()
lb_one_hot = logits.data.clone().zero_().scatter_(1, label.unsqueeze(1), 1)
loss = criteria_bce(logits, lb_one_hot)
print(loss)
``````

In theory, these two loss should have same value, since they are both binary classification loss. Why the loss value is actually different, with one to be 2.01 and the other to be 0.77 ?

Hi coincheung!

First the what:

`BCEWithLogitsLoss` expects a single (real) number per sample
that indicates the â€śstrengthâ€ť of that sample being in the â€ś1â€ť
state (the â€śyesâ€ť state, if you will).

To recover the loss you get with `CrossEntropyLoss` you need to
pass in the difference of your state-1 and state-0 strengths.

This code performs the calculation I think you want:
(For simplicity, Iâ€™ve removed two of your dimensions; the labels
are now a vector of five samples, with `labels.shape = [5]`.)

``````import torch
import torch.nn as nn

preds = torch.randn (5, 2)
labels = torch.randint (0, 2, (5, ))

logits = preds[:, 1] - preds[:, 0]
bcelogitsloss = nn.BCEWithLogitsLoss()(logits, labels.float())

celoss = nn.CrossEntropyLoss()(preds, labels)

print (bcelogitsloss, celoss)
``````

Now the why:

Classic cross-entropy loss measures the mismatch between
two (discrete) probability distributions. So, for the binary case,
you compare (Q(â€śnoâ€ť state), Q(â€śyesâ€ť state)) with (P(â€śnoâ€ť state),
P(â€śyesâ€ť state)), where P(â€śnoâ€ť state) is the actual (â€śground
truthâ€ť) probability that your sample is in the â€śnoâ€ť state, while
Q(â€śnoâ€ť state) is your modelâ€™s prediction of this probability.

(As probabilities, they are all between 0 and 1, and P(â€śnoâ€ť) +
P(â€śyesâ€ť) = 1, and similarly for the Qs.)

Pytorchâ€™s `CrossEntropyLoss` has a built-in `Softmax` that coverts
your modelâ€™s predicted â€śstrengthsâ€ť (relative log-odds-ratios)
into probabilities that sum to one. It also one-hots your labels
so that (in the binary case) label = 1 turns into P(â€śnoâ€ť) = 0,
and P(â€śyesâ€ť) = 1. It then calculates the cross-entropy of these
two probability distributions.

`BCELoss` calculates this same cross-entropy, but it knows that
itâ€™s the binary case, so you only give it one of the two
probabilities, Q(â€śyesâ€ť), and you can understand the 0 and 1
labels as simply being the values of P(â€śyesâ€ť).

This is illustrated by further running the following code:

``````softmaxs = nn.Softmax (dim = 1)(preds)
bcesoftmaxloss = nn.BCELoss()(softmaxs[:, 1], labels.float())
print (bcesoftmaxloss)
``````

Just as `CrossEntropyLoss` has a built-in `Softmax` (to convert
â€śstrengthsâ€ť to probabilities), `BCEWithLogitsLoss` has a built-in
logistic function (`Sigmoid`) to convert the â€śstrengthâ€ť of the â€śyesâ€ť
state into the probability Q(â€śyesâ€ť). More precisely, the â€śstrengthâ€ť
is the log-odds-ratio of the â€śyesâ€ť state, also called the â€ślogitâ€ť.
That is, `BCEWithLogitsLoss` expects logit(Q(â€śyesâ€ť)) as its input,
and the built-in `Sigmoid` converts it back to Q(â€śyesâ€ť).

Best regards.

K. Frank

5 Likes