BCELoss vs BCEWithLogitsLoss

ptrblck · January 2, 2019, 11:24am

As you described the only difference is the included sigmoid activation in nn.BCEWithLogitsLoss.
It’s comparable to nn.CrossEntropyLoss and nn.NLLLoss. While the former uses a nn.LogSoftmax activation function internally, you would have to add it in the latter criterion.