Two output nodes for binary classification

shirui-japina · October 20, 2019, 7:58am

I refer to the model in the paper ‘Deep Learning for the Classification of Lung Nodules’.
The model is like this:

    def forward(self, input_image):
        # (channels, height, width)
        ##
        out = self.conv_1(input_image) # 20 * 44 * 44
        out = self.bn_conv_1(out)

        out = self.pooling_1(out) # 20 * 22 * 22
        out = self.bn_pooling_1(out)

        ##
        out = self.conv_2(out) # 50 * 16 * 16
        out = self.bn_conv_2(out)

        out = self.pooling_2(out) # 50 * 8 * 8
        out = self.bn_pooling_2(out)

        ##
        out = self.conv_3(out) # 500 * 2 * 2
        out = self.bn_conv_3(out)

        out = self.pooling_3(out) # 500 * 1 * 1
        out = self.bn_pooling_3(out)

        out = F.relu(out)
        out = self.conv_4(out) # 2 * 1 * 1

        out = F.softmax(out)

My doubt is that:

This is a binary classification model, but the output has two nodes.
(Generally, there is only one output node in the binary classification model, and the prediction result is judged by greater than or less than 0.5.)
Although I don’t know if this is a key consideration, there is no fully connected layer in the model.

Considering the above, how should I design the code for the loss function?

ptrblck · October 20, 2019, 9:26am

For a binary classification use case, you could use a single output and a threshold (as you’ve explained) or alternatively you could use a multi-class classification with just two classes, so that each class gets its output neuron. The loss functions for both approaches would be different.
In the first case (single output), you would use e.g. nn.BCEWithLogitsLoss and the output tensor shape should match the target shape.
In the latter case, you would use e.g. nn.CrossEntropyLoss and the target tensor shape should contain the class indices in the range [0, nb_classes-1] and miss the “class dimension” (usually the channel dim).

Both approaches expect logits, so you should remove your softmax layer and just pass the last output to the criterion.

A final linear layer is not strictly necessary, if you make sure to work with the right shapes of your output and target.

shirui-japina · October 20, 2019, 11:15am

Thanks for your reply.

In the latter case, you would use e.g. nn.CrossEntropyLoss and the target tensor shape should contain the class indices in the range [0, nb_classes-1] and miss the “class dimension” (usually the channel dim).

I got it.

Both approaches expect logits, so you should remove your softmax layer and just pass the last output to the criterion.

Thanks for your suggestion, ‘so you should remove your softmax layer and just pass the last output to the criterion.’, that’s really the point.

In addition to these, there are questions:

Both approaches expect logits, so you should remove your softmax layer and just pass the last output to the criterion.

Does above means the description here ( CrossEntropyLoss) like below?

The input is expected to contain raw, unnormalized scores for each class.

But why is it? In my opinion, the input of CrossEntropyLoss is:
(1) the prediction (the output of model); (2) the label.
And the prediction data should be converted to probability, which means that I need to place a layer of soft-max layer at the end of the model.

Is that my wrong understanding that the input data of CrossEntropyLoss should be probability?
(So that put the model output data to the CrossEntropyLoss directly is correct from mathematical theory?)

Or alternatively the CrossEntropyLoss function in PyTorch will do that (computes the probability of prediction of model) automatically?

shirui-japina · October 20, 2019, 2:05pm

I looked at the source code and came to the conclusion.

def cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100,
                  reduce=None, reduction='mean'):
    if size_average is not None or reduce is not None:
        reduction = _Reduction.legacy_get_string(size_average, reduce)
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)

From the source code, I’m sure that

Or alternatively the CrossEntropyLoss function in PyTorch will do that (computes the probability of prediction of model) automatically ?

This is the right answer.