# What Loss function to use in Binary CNN Classification problem

I am running a Transfer Learning scenario with a ResNet model. The original work was a classifier with hundreds of classes, and it used the CrossEntropyLoss function `nn.CrossEntropyLoss()`.
A thread here suggest BCELoss, but there is BCEWithLogitsLoss that also seems fit.

In a confusion matrix, I want to optimize for the least amount of False Positives, even if it hurts my True Positive score.
What (of all) loss functions should I use ? And why ?

Hi Rafael!

For a binary classification problem, `BCEWithLogitsLoss`
should be your go-to loss function. (You would only want to
use `BCELoss` if your network naturally emits probabilities,
which it almost certainly doesnâ€™t.)

Consider carefully the trade-off that you are implying here. As
an extreme, you could simply classify everything as â€śnegativeâ€ť.
You now achieve the â€śleast amount of False Positivesâ€ť (none
at all), but, of course your True Positive score is also really

So I assume your real trade-off is that you are willing to reduce
your True Positives some, but not too much, if you can get a
substantial reduction in your False Positives.

A sensible approach to achieve this would be to weight your
â€śnegativeâ€ť samples more heavily than your â€śpositiveâ€ť samples.
`BCEWithLogitsLoss` has a `weight` argument for this purpose.

The relative weight between your â€śnegativeâ€ť and â€śpositiveâ€ť samples
will determine how much you train your network to reduce your
False Positives at the cost of reducing your True Positives.

Good luck.

K. Frank

1 Like

Thanks ! Very informative.

Iâ€™m just trying to figure out what changes when using `CrossEntropyLoss().cuda()` and `BCEWithLogitsLoss().cuda()`

Just placing the BCE in place of the CE trows me this error:
`ValueError: Target size (torch.Size([64])) must be the same as input size (torch.Size([64, 2]))`

This is a sniplet of the training step:

``````    for i, (input, target) in enumerate(val_loader):
target = target.cuda(async=True)

# compute output
output = model(input_var)
loss = criterion(output, target_var) # <- error here!
``````

Here are the two variables passed into criterion() :

``````(Pdb) output
tensor([[-0.2657,  0.1728],
[ 0.3407, -0.6961],
[ 0.8020, -0.8201],
[ 0.1457,  0.0311],
[-0.2517,  0.0223],
[-0.1266, -0.3978],
[ 0.4527, -0.6096],
[ 0.2077, -0.1428],
[-0.1205, -0.5252],
[ 0.5462, -0.3988],
[-0.1215, -0.1321],
[ 0.3062, -0.5417],
[ 0.0723, -0.0537],
[-0.5435, -1.1898],
[ 0.0718, -0.0986],
[ 0.0118, -0.0860],
[-0.0998, -0.8494],
[-0.2591, -0.4207],
[ 0.2687, -0.6160],
[-0.2336, -0.4814],
[-0.1896, -0.1463],
[ 0.4623, -0.5179],
[-0.3181, -0.3042],
[-0.2550, -0.1824],
[-0.6250, -0.1293],
[-0.8920,  0.1077],
[ 0.0013, -0.1081],
[-0.2565, -0.0777],
[-0.2360, -0.3112],
[ 0.0615, -0.3419],
[-0.4794, -0.1323],
[-0.0624,  0.1003],
[ 0.1803, -0.2833],
[-0.0859,  0.0516],
[-0.0256, -0.4226],
[-0.6047, -0.3403],
[ 0.2778, -0.6168],
[ 0.0973, -0.3736],
[-0.2165, -0.2941],
[ 0.0252, -0.2497],
[-0.1285, -0.3079],
[-0.3292, -0.5657],
[ 0.1660, -0.5869],
[-0.1829, -0.3313],
[-0.5305,  0.0671],
[ 0.2120, -0.5442],
[-0.1197, -0.0711],
[ 0.2132, -0.5229],
[-0.0977, -0.3243],
[ 0.1694, -0.2342],
[ 0.0137, -0.3607],
[-0.3495, -0.2702],
[ 0.3058, -0.8327],
[ 0.4417, -0.7817],
[-0.7523, -0.5299],
[ 0.0826, -0.3280],
[-0.4834, -0.4926],
[-0.5763,  0.0012],
[ 0.0992, -0.8658],
[-0.1066,  0.4763],
[-0.4472,  0.2544],
[-0.3449, -0.1687],
[-0.1852,  0.1073],
``````
``````(Pdb) target
tensor([0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1,
0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1], device='cuda:0')
``````

Hello Rafael!

Without seeing the rest of your code â€“ your model, in particular â€“
Iâ€™m guessing somewhat, but I believe that your problem could be
the following:

If you have a multi-class problem (where â€śmultiâ€ť implies more
than two, and two classes is what we call a â€śbinaryâ€ť problem)
with nClass classes, then the output of your model should be
nClass logits. (This is what `CrossEntropyLoss` expects.)

In contrast, for a binary problem, the output of your model should
be a single logit (not two), conventionally taken to be the logit for
your â€śpositiveâ€ť class. This is what `BCEWithLogitsLoss` expects.

If you build your binary-problem model as a two-class multi-class
model, then you will (redundantly) have two logits as your output
(one for your â€śnegativeâ€ť class, as well as for your â€śpositiveâ€ť class).
This wonâ€™t match what `BCEWithLogitsLoss` expects, so I would
think you would get the error you report.

(You can build your binary model as a two-class multi-class
model, but then you should feed the modelâ€™s output into
`CrossEntropyLoss`. If you match everything up right, you
should get the same results as you would with a conventional
binary model feeding `BCEWithLogitsLoss`, but you probably
lose a more or less insignificant bit of efficiency.)

Best.

K. Frank

1 Like