Output of the binary classification model

shirui-japina · September 19, 2019, 6:19am

I’m using PyTorch to build a model for binary classification.

class XceptionLike(nn.Module):
    def __init__(self):
        super().__init__()
        # CNN part
        ...
        # final
        self.fc_last_1 = nn.Linear(384, 64)
        self.fc_last_2 = nn.Linear(64, 1)

    def forward(self, input_16, input_32, input_48):
        # CNN part
        ...
        out = torch.cat([out_16, out_32, out_48], dim=1)
        out = F.relu(self.fc_last_1(out))
        out = F.relu(self.fc_last_2(out))
        return out

# the model output part
output = model(batch_z_16, batch_z_32, batch_z_48) # model prediction
output = torch.sigmoid(output)
loss = criterion(output, batch_label)

What confuses me is that can this model used for binary classification really?
In my understanding, for binary classification

output of model [0, 0.5] means prediction for one class.
output of model [0.5, 1] means prediction for the other one.

But ReLU function returns [0, positive infinity],
and when sigmoid function gets the output of the model,
it returns [0.5, 1], so the model cant’t return [0, 0.5], which means it can’t predict the class which is belong to [0, 0.5].

What is wrong with my understanding?
How can I deal with it?

shirui-japina · September 19, 2019, 7:16am

At the last layer, I should not use out = F.relu(out) but out = torch.sigmoid(out), then the model can output [0, 1]. (so can predict class [0, 0.5] and [0.5, 1])

    def forward(self, input_16, input_32, input_48):
        # CNN part
        ...
        out = torch.cat([out_16, out_32, out_48], dim=1)
        out = F.relu(self.fc_last_1(out))
        out = self.fc_last_2(out) 
        out = torch.sigmoid(out)
        return out

# the model output part
output = model(batch_z_16, batch_z_32, batch_z_48) # model prediction
loss = criterion(output, batch_label)

Is this the correct answer?

dejanbatanjac · September 19, 2019, 7:59am

Yes, you should use sigmoid function.

def sigmoid(x): return 1/(1 + (-x).exp())

It will convert the space of [-inf, inf] into a probability [0,1].
Note this sigmoid works on a tensor. So it will do that for all your activations.

What ever goes to the sigmoid you can call “logit”, even though this is not a mathematical logit function.

After that you will use bce, which works on a batch.

def binary_cross_entropy(p, y): return -(p.log()*y + (1-y)*(1-p).log()).mean()

Note that sigmoid used exponential function to grab the probabilities and now we use log of these probabilities.

shirui-japina · September 19, 2019, 8:15am

I got it
Appreciate your detailed explanation.