Using BCEWithLogitsLoss in training and BCELoss for test

Simply because without the sigmoid activation your model will give you logits that are not guaranteed to be bounded between 0 and 1.

As the name implies BCEWithLogitsLoss can compute binary cross-entropy from the raw logits while the BCELoss needs a binary Tensor as mentioned in the docs (BCELoss — PyTorch 2.1 documentation)

See past discussion here: BCELoss vs BCEWithLogitsLoss

So there are two options:

  1. model(input) → logits → BCEWithLogitsLoss → loss
  2. model(input) → logits → F.sigmoidBCELoss → loss

I would recommend using the same steps during both training and test to avoid discrepancies