I think theoretically BCE and Cross Entropy for binary classification would be giving the same result. I have coded a model which is doing a Binary Classification and have used CrossEntropy Loss itself. I am a bit reluctant to change the model now and was hoping to understand if it is actually required. Any help would be really appreciated since somehow I feel the results I am getting are a bit too good to be true.
I think the main difference would be the last linear layer and the corresponding weight matrix.
While you would use a single output neuron and a sigmoid activation for nn.BCELoss, you would use two outputs for nn.CrossEntropyLoss. This would basically double the number of parameters in this layer.
If you think your results are too good, I would recommend to check for some data leak, as this might be more likely that the small difference between the criteria.
Yup, the difference between the number of dimensions in the output layer was something that I found out as well. However, I have a relatively smaller network and I can live with “doubling number of params” in the final layer. I will check the network for data leak.