BCEWithLogitsLoss() is used in multi-label classification. Since BCEWithLogitsLoss() combines one sigmoid layer, there is no need to set a sigmoid layer in the model when training. By the way, when evaluating, should we put the sigmoid layer in the model and then binarize activations based on 0.5 threshold?
Dataset
input
[0, 0, 1, 1, 0, 0, 0, 1]
target(label)
[0, 1, 1, 1, 0]