How to get the accuracy without softmax layer?

I’m trying to fine-tuning vgg16. Then I got the classifier of it:
(I have changed the last output layer.)

(classifier): Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace)
  (2): Dropout(p=0.5)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
  (4): ReLU(inplace)
  (5): Dropout(p=0.5)
  (6): Linear(in_features=4096, out_features=1, bias=True)
)

My question is that, when use the model to predict, its output is with no range. Actually, I got

tensor([[ 0.9261],
        [ 0.6800],
        [ 0.5750],
        [ 0.5498],
        [ 0.6597],
        [ 0.7453],
        [ 0.5137],
        [ 0.6788],
        [ 1.0495],
        [ 0.7216],
        [-0.2671],
...

And I use nn.BCEWithLogitsLoss() ,
(nn.BCEWithLogitsLoss() is better than nn.BCELoss() ? )
so I can’t (shouldn’t) use output = torch.sigmoid(output) and there is no softmax layer in the model. What is the correct way I get the accuracy? (The label is 0 or 1.)

The way I think of is,

output = torch.sigmoid(output)
if 0 =< output < 0.5:
    # prediction is label 0
else:
    # prediction is label 1

But this makes

  • get value of loss by output data
  • get value of accuracy by torch.sigmoid(output data)

Can I do like this? Does it mean get value of loss and accuracy by different data, so it’s not in line with mathematical logic?

Yes, your own answer makes sense :wink:
But you can do simpler : comparing the output of the sigmoid to 0.5 is equivalent to comparing the input of the sigmoid to 0 ! (see wikipedia)
So, you don’t need to call .sigmoid() ; just see where output < 0.

1 Like

Oh…

comparing the output of the sigmoid to 0.5 is equivalent to comparing the input of the sigmoid to 0

Yes, thank you for your suggestion :blush:
But why

  • get value of loss by output data
  • get value of accuracy by torch.sigmoid( output data )

is the right way for getting value of loss and accuracy? :flushed:
We should get both loss and accuracy by the same data, isn’t it right?

Because calling nn.BCEWithLogitsLoss is the equivalent of calling first nn.Sigmoid, and then nn.BCELoss.
It’s just that those two are often called one after the other, so they designed nn.BCEWithLogitsLoss that does both, and better.
To cite the docs :
This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.
(no idea what it means)

So in both cases (for the loss and for the accuracy) it’s the data resulting of the sigmoid which is taken into account. It’s just hidden beneath a trick for the loss, and for the accuracy you don’t really need to compute it since you can just compare the raw logits with 0.

1 Like

Oh…I got it…:flushed:
Truly appreciate your timely help :smile: