Using BCEWithLogitsLoss in training and BCELoss for test

chiragjn · February 24, 2022, 5:08pm

Simply because without the sigmoid activation your model will give you logits that are not guaranteed to be bounded between 0 and 1.

As the name implies BCEWithLogitsLoss can compute binary cross-entropy from the raw logits while the BCELoss needs a binary Tensor as mentioned in the docs (BCELoss — PyTorch 2.1 documentation)

See past discussion here: BCELoss vs BCEWithLogitsLoss

So there are two options:

model(input) → logits → BCEWithLogitsLoss → loss
model(input) → logits → F.sigmoid → BCELoss → loss

I would recommend using the same steps during both training and test to avoid discrepancies