@ptrblck Isn’t it the other way around? I thought BCELoss
needs to receive the outputs of Sigmoid
activation as its input, but the other-one BCEWithLogitsLoss
will need the logits as inputs instead of outputs of Sigmoid
, since it will apply sigmoid internally.
Although, the example in the docs do not apply Sigmoid function prior to BCELoss:
### Example from pytorch-docs:
>>> m = nn.Sigmoid()
>>> loss = nn.BCELoss()
>>> input = torch.randn(3, requires_grad=True)
>>> target = torch.empty(3).random_(2)
>>> output = loss(m(input), target)
>>> output.backward()
So, I suppose the loss should be computed as
logits = m(input)
output = loss(torch.sigmoid(logits), target)
Is that right?