Cross Entropy and BCE

I think theoretically BCE and Cross Entropy for binary classification would be giving the same result. I have coded a model which is doing a Binary Classification and have used CrossEntropy Loss itself. I am a bit reluctant to change the model now and was hoping to understand if it is actually required. Any help would be really appreciated since somehow I feel the results I am getting are a bit too good to be true.

I think the main difference would be the last linear layer and the corresponding weight matrix.
While you would use a single output neuron and a sigmoid activation for nn.BCELoss, you would use two outputs for nn.CrossEntropyLoss. This would basically double the number of parameters in this layer.

If you think your results are too good, I would recommend to check for some data leak, as this might be more likely that the small difference between the criteria.

2 Likes

Yup, the difference between the number of dimensions in the output layer was something that I found out as well. However, I have a relatively smaller network and I can live with “doubling number of params” in the final layer. I will check the network for data leak.