BCELoss vs BCEWithLogitsLoss

vmirly1 · January 2, 2019, 2:24pm

@ptrblck Isn’t it the other way around? I thought BCELoss needs to receive the outputs of Sigmoid activation as its input, but the other-one BCEWithLogitsLoss will need the logits as inputs instead of outputs of Sigmoid, since it will apply sigmoid internally.

Although, the example in the docs do not apply Sigmoid function prior to BCELoss:

### Example from pytorch-docs:
>>> m = nn.Sigmoid()
>>> loss = nn.BCELoss()
>>> input = torch.randn(3, requires_grad=True)
>>> target = torch.empty(3).random_(2)
>>> output = loss(m(input), target)
>>> output.backward()

So, I suppose the loss should be computed as

logits = m(input)
output = loss(torch.sigmoid(logits), target)

Is that right?