Output of the binary classification model

I’m using PyTorch to build a model for binary classification.

``````class XceptionLike(nn.Module):
def __init__(self):
super().__init__()
# CNN part
...
# final
self.fc_last_1 = nn.Linear(384, 64)
self.fc_last_2 = nn.Linear(64, 1)

def forward(self, input_16, input_32, input_48):
# CNN part
...
out = torch.cat([out_16, out_32, out_48], dim=1)
out = F.relu(self.fc_last_1(out))
out = F.relu(self.fc_last_2(out))
return out

# the model output part
output = model(batch_z_16, batch_z_32, batch_z_48) # model prediction
output = torch.sigmoid(output)
loss = criterion(output, batch_label)
``````

What confuses me is that can this model used for binary classification really?
In my understanding, for binary classification

• output of model [0, 0.5] means prediction for one class.

• output of model [0.5, 1] means prediction for the other one.

But ReLU function returns [0, positive infinity],
and when sigmoid function gets the output of the model,
it returns [0.5, 1], so the model cant’t return [0, 0.5], which means it can’t predict the class which is belong to [0, 0.5].

What is wrong with my understanding?
How can I deal with it?

At the last layer, I should not use `out = F.relu(out)` but `out = torch.sigmoid(out)`, then the model can output [0, 1]. (so can predict class [0, 0.5] and [0.5, 1])

``````    def forward(self, input_16, input_32, input_48):
# CNN part
...
out = torch.cat([out_16, out_32, out_48], dim=1)
out = F.relu(self.fc_last_1(out))
out = self.fc_last_2(out)
out = torch.sigmoid(out)
return out

# the model output part
output = model(batch_z_16, batch_z_32, batch_z_48) # model prediction
loss = criterion(output, batch_label)
``````

Yes, you should use sigmoid function.

``````def sigmoid(x): return 1/(1 + (-x).exp())
``````

It will convert the space of [-inf, inf] into a probability [0,1].
Note this sigmoid works on a tensor. So it will do that for all your activations.

What ever goes to the sigmoid you can call “logit”, even though this is not a mathematical logit function.

After that you will use bce, which works on a batch.

``````def binary_cross_entropy(p, y): return -(p.log()*y + (1-y)*(1-p).log()).mean()
``````

Note that sigmoid used exponential function to grab the probabilities and now we use log of these probabilities.

1 Like

I got it