Just to clarify, if using nn.BCEWithLogitsLoss(target, output)
, output
should be passed through a sigmoid and only then to BCEWithLogitsLoss
? I don’t understand why one would pass it through a sigmoid twice because x is already a probability after passing through one sigmoid.