Confused about binary classification with Pytorch

I have 5 classes and would like to use binary classification on one of them.

This is my model:

    model = models.resnet50(pretrained=pretrain_status)
    num_ftrs = model.fc.in_features
    model.fc = nn.Sequential(
      nn.Dropout(dropout_rate),
      nn.Linear(num_ftrs, 2))

I then split my dataset into two folders. The one I want to predict (1) and the rest (0,2,3,4).

However, this setup does two predictions and, as I understand it, binary classification is one prediction (True, False).

My question is how do you properly setup a binary classification model in Pytorch?

Hi,

For binary classification, you need only one logit so, a linear layer that maps its input to a single neuron is adequate. Also, you need to put a threshold on the logit output by linear layer. But an activation layer as the last layer is more rational, something like sigmoid.

Thanks! Would you mind giving me an example with code? I’ve never done it.

What loss do you recommend? I’m currently using CrossEntropy.

I have one concern regarding the use of a single output, I’m worried that if the model only sees the class I’m trying to predict, it will predict class 0,2,3,4 as class 1 later on.

If you want to define your model from scratch, use this tutorial. But based on your sample code, it seems you are using transfer learning. Here is the tutorial for transfer learning.

For case of binary, BCELoss is a good choice.

Why should model see only class 1 samples? In the first post, you have mentioned that you are using two separate folders, one for class 1 and the latter for class 0, 2, 3, 4. So, the model will see all samples and learns class 1 as 1 and 0, 2, 3, 4 as class not 1 or zero.
You can define binary models to learn each separate class and combine them. This approach called OneVsAll.

Hello Alex and Doosti!

Just to clarify something, for a binary-classification problem, you
are best off using the logits that come out of a final Linear layer,
with no threshold or Sigmoid activation, and feed them into
BCEWithLogitsLoss. (Using Sigmoid and BCELoss is less
numerically stable.)

And, as Doosti recommended, your last layer should have a single
output, rather than 2. Thus:

nn.Linear(num_ftrs, 1))

Best.

K. Frank

2 Likes

Thanks!

What do you use instead of argmax?

Hi Alex!

The short answer is that you threshold your single logit output
against 0.0, rather than running a set of nClass outputs through
argmax().

Let me confirm what I think you are asking:

In addition to calculating your loss function (used for training), you
often also want to calculate the accuracy of your predictions.

For a multi-class classification problem, you typically pass a set
of nClass predicted logits (or predicted probabilities) though
argmax() to get the single predicted integer class label (that
you then compare with your kown class label). I assume that
this is the β€œargmax” you are talking about.

For a binary problem, your last Linear layer will output a single
predicted logit for the sample being in class-β€œ1” (as opposed to
being in class-β€œ0”). (Or, if you pass this logit through a sigmoid(),
you will get the predicted probability of the sample being in class-β€œ1”.)

In this case you threshold the output to get a binary prediction:
logit > 0.0 == True means you predict that the sample is
in class-β€œ1” (and logit > 0.0 == False means class-β€œ0”). (If
you are working with probabilities, then prob > 0.5 == True
means class-β€œ1”.) You then compare this prediction with the known
class-β€œ0” / class-β€œ1” binary label for the sample in question.

Good luck

K. Frank

1 Like

Thanks for that detailed answer! Its exactly what I’m trying to do. Here’s my code:

criterion = nn.BCEWithLogitsLoss()

for image_batch, label_batch in val_dl:
  image_batch, label_batch = image_batch.to(DEVICE), label_batch.to(DEVICE)

  pred_batch = model(image_batch)
  label_batch = label_batch.view(-1, 1)
  batch_loss = criterion(pred_batch.double(), label_batch.double())
  total_val_loss += batch_loss.item()

Here the pred_batch output:
[ [-0.7402], [-1.0285], [-0.7212], [-0.6358], [-0.4438], [-0.8488]...]

From here do I have to use sigmoid to get 0/1?

In fact, I just tried:

      pred_batch[pred_batch >= 0] = 1
      pred_batch[pred_batch < 0] = 0

and

      pred_batch = torch.sigmoid(pred_batch)
      pred_batch[pred_batch >= 0.5] = 1
      pred_batch[pred_batch < 0.5] = 0

but in both cases the model is not learning