As you described the only difference is the included sigmoid activation in nn.BCEWithLogitsLoss
.
It’s comparable to nn.CrossEntropyLoss
and nn.NLLLoss
. While the former uses a nn.LogSoftmax
activation function internally, you would have to add it in the latter criterion.
6 Likes