Simply because without the sigmoid activation your model will give you logits that are not guaranteed to be bounded between 0 and 1.
As the name implies BCEWithLogitsLoss can compute binary cross-entropy from the raw logits while the BCELoss needs a binary Tensor as mentioned in the docs (BCELoss — PyTorch 2.1 documentation)
See past discussion here: BCELoss vs BCEWithLogitsLoss
So there are two options:
model(input)→ logits →BCEWithLogitsLoss→ lossmodel(input)→ logits → F.sigmoid →BCELoss→ loss
I would recommend using the same steps during both training and test to avoid discrepancies