BCELoss vs BCEWithLogitsLoss

As you described the only difference is the included sigmoid activation in nn.BCEWithLogitsLoss.
It’s comparable to nn.CrossEntropyLoss and nn.NLLLoss. While the former uses a nn.LogSoftmax activation function internally, you would have to add it in the latter criterion.

6 Likes