I was wondering which is the correct way to perform element wise binary classification using the BCEloss.

My model outputs a tensor of shape (depth, width, length) and its last activation is an element wise Sigmoid.

My target contains either 0 or 1 and has also shape (depth, width, length).

I am currently computing the loss with the default reduction “mean” and I backpropagate this.

The mean loss decreases as expected however my model predicts for all pixels probabilities that are lower than 0.5 so all the elements equal to 1 are wrongly classified.

I think that this is mainly due to the fact that the 1s are rare in the tensor.

Should I change loss? In particular I was wondering if I should use reduction=none? If yes how can I backpropagate this elementwise loss?

As an aside, you will have better numerical stability if you use BCEWithLogitsLoss and remove the final Sigmoid layer.

Consider using the pos_weight argument passed to the constructor
of BCEWithLogitsLoss to compensate for the rarity of “1” pixels.

A typical value for pos_weight to reweight the rare values would be approx_number_of_1_pixels / approx_number_of_0_pixels.

(The value for pos_weight does not need to be especially precise.)

Note: You don’t need to use 0.5 as the threshold to convert predicted
probabilities (or 0.0 to threshold logits) to 0/1 values. You can lower
the threshold to predict more 1s (but using pos_weight is likely to be
the better approach).