What output format and loss function should we use for Binary Classification?

In terms of performs.

For binary classification you should can use the BCELoss or BCEWithLogitsLoss. The difference between the two is that BCELoss assumes the input is probabilities that are computed using nn.Sigmoid() in your last layer. On the other hand, BCEWithLogitsLoss assumes that the inputs are logits without passing through Sigmoid activation.

Finally, in both cases (BCELoss or BCEWithLogitsLoss) the input and target should be 1-dimensional tensors with size [batch_size].

What if I use Cross Entropy loss on network with 2 output nodes?

If you have binary class labels 0 and 1, Cross Entropy still works. In this case, you should use the targets (ground-truth labels) as a tensor of LongTensor not type float.

Thanks! I also wonder if the numerical stability would be different in these cases?