In terms of performs.

For binary classification you should can use the BCELoss or BCEWithLogitsLoss. The difference between the two is that BCELoss assumes the input is probabilities that are computed using `nn.Sigmoid()`

in your last layer. On the other hand, `BCEWithLogitsLoss`

assumes that the inputs are logits without passing through Sigmoid activation.

Finally, in both cases (`BCELoss`

or `BCEWithLogitsLoss`

) the input and target should be 1-dimensional tensors with size `[batch_size]`

.

What if I use Cross Entropy loss on network with 2 output nodes?

If you have binary class labels 0 and 1, Cross Entropy still works. In this case, you should use the targets (ground-truth labels) as a tensor of LongTensor not type float.

Thanks! I also wonder if the numerical stability would be different in these cases?