Problem with fine-tuning for binary classification

TTeuZ · March 7, 2024, 2:21am

Hello guys!

I’m working on fine-tuning a MobileNetV3 for binary classification. My first attempt was to change the classification layer to output one feature and use BCEWithLogisticLoss as my loss function and torch.sigmoid for prediction.

Now, I want to check if the result would be better if I change to output 2 features and use the BCELoss as my loss function and here is where I’m stuck.

Firstly, I only changed the code to set 2 output features, and with that I got this error:

Using a target size (torch.Size([64])) that is different to the input size (torch.Size([64, 2])) is deprecated. Please ensure they have the same size.

After researching a little, I discovered that the right way is to use one_hot labels to perform the loss function, and with that I changed the implementation to one_hot labels

the implementation is this:

labels_one_hot = torch.zeros(labels.size(0), 2).to(device)
labels_one_hot.scatter_(1, labels.unsqueeze(1), 1)

With this change, I got this error:

t./aten/src/ATen/native/cuda/Loss.cu:94: operator(): block: [0,0,0], thread: [2,0,0] Assertion `input_val >= zero && input_val <= one` failed.

Searching again, I found that this error is related to out of range exceptions, and that probably the range of my labels tensor is not between 0 and num_classes-1.

So, to check that I printed both my labels and one_hot_labels tensors and got this:

labels: tensor([0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1,
        0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0,
        1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0], device='cuda:0')

one hot: tensor([[1., 0.], [1., 0.], [1., 0.], [1., 0.], [1., 0.], [1., 0.], [0., 1.], [1., 0.], [0., 1.], [1., 0.], [0., 1.], [1., 0.], [0., 1.], [1., 0.], [0., 1.], [0., 1.], [1., 0.], [0., 1.], [0., 1.], [0., 1.], [1., 0.], [1., 0.], [0., 1.], [0., 1... [1., 0.], [0., 1.], [1., 0.], [1., 0.], [1., 0.], [0., 1.], [1., 0.], [1., 0.], [0., 1.], [0., 1.], [1., 0.], [0., 1.], [1., 0.], [1., 0.], [1., 0.], [0., 1.], [1., 0.], [1., 0.]], device='cuda:0')

So, it seems that the range is right (between 0 and 1) the only difference is that the one_hot is float and the labels are int. With that in mind, I decide to force the one_hot_labels to int type and now the “out of range” error disappeared but I got this error:

Found dtype Int but expected Float

Long story short, if I use float I get an “out of range error”, but if I change to int this error disappears and I get one “I need float” error.

Do you guys know what I could be doing wrong?

KFrank · March 7, 2024, 3:46pm

Hi Paulo!

As an aside, just to be clear, you do not want to pass the output of your
classification layer through sigmoid() and then to BCEWithLogitsLoss.

You do not want use BCELoss with a two-output classification layer. It is
true that by using a two-output layer you are logically performing a binary
classification, but it is set up as a multi-class classification that happens to
have two (therefore binary) classes.

For a multi-class problem you would want to use CrossEntropyLoss.

It is reasonable perform binary classification as a two-class multi-class
classification using a two-output classification layer and CrossEntropyLoss.
The two approaches are very similar, and in my experience, it doesn’t really
matter which you use.

(My preference is to use a single-output layer with BCEWithLogitsLoss
as theoretically this should be ever so slightly more efficient.)

Before we delve into your specific errors, first sort out whether you should
be using CrossEntropyLoss for your two-output approach. (Please post
follow-up questions if you still get errors.)

Best.

K. Frank

TTeuZ · March 7, 2024, 8:55pm

Hey K. Frank, thanks for your answer!

As an aside, just to be clear, you do not want to pass the output of your
classification layer through sigmoid() and then to BCEWithLogitsLoss .

Just to clarify, I’m not passing the results to a sigmoid and then passing them to BCEWithLogisticLoss. Maybe my English wasn’t the best when I wrote this topic haha.

I already fine-tuned the MobileNetV3 with a one-output classification layer and got my results, and now I decided to check if the result is different if I use a two-output classification layer.

I decided to use BCELoss (Binary Cross Entropy Loss) as my loss function in the training process because it seemed to make sense, but if it’s not right I can change it.

The issue that I’m facing is that, if I use the BCELoss, I need to convert my one-size labels tensor to a two-size labels tensor, and the indicated way that I found to do that is use one-hot-encoded labels.

Using a target size (torch.Size([64])) that is different to the input size (torch.Size([64, 2])) is deprecated. Please ensure they have the same size.

In my way to create my one-hot-encoded labels tensor I found this implementation

labels_one_hot = torch.zeros(labels.size(0), 2).to(device)
labels_one_hot.scatter_(1, labels.unsqueeze(1), 1)

But with this implementation I got this error

../aten/src/ATen/native/cuda/Loss.cu:94: operator(): block: [0,0,0], thread: [127,0,0] Assertion `input_val >= zero && input_val <= one` failed.

It tells me that my labels aren’t between 0, 1(num_classes-1), but when I print the tensor I can check that my labels are between 0, 1 and it’s here that I’m stuck.

one hot: tensor([[1., 0.], [1., 0.], [1., 0.], [1., 0.], [1., 0.], [1., 0.], [0., 1.], [1., 0.], [0., 1.], [1., 0.],
[0., 1.], [1., 0.], [0., 1.], [1., 0.], [0., 1.], [0., 1.], [1., 0.], [0., 1.], [0., 1.], [0., 1.], [1., 0.], [1., 0.],
[0., 1.], [0., 1... [1., 0.], [0., 1.], [1., 0.], [1., 0.], [1., 0.], [0., 1.], [1., 0.], [1., 0.], [0., 1.], [0., 1.],
[1., 0.], [0., 1.], [1., 0.], [1., 0.], [1., 0.], [0., 1.], [1., 0.], [1., 0.]], device='cuda:0')

I checked and if I use the CrossEntropyLoss it works perfectly, but I still want to try the BCELoss if it makes sense.

ptrblck · March 8, 2024, 1:38pm

No, it doesn’t since you are creating a multi-label classification as was already explained. nn.BCELoss and nn.BCEWithLogitsLoss are used for the same use cases (binary or multi-label) and the main difference between them is that the former requires a probability input tensor (usually acquired via sigmoid) and is numerically less stable.
nn.CrossEntropyLoss with two output logits is the correct alternative.

TTeuZ · March 14, 2024, 12:41am

Hi guys, sorry for the late response

Thank you so much for the help, I got why it isn’t a good idea to use BCELoss. I did the modifications and now I’m using CrossEntropyLoss as mentioned by KFrank and you.

I will mark this post as solved =D.