As an aside, you will have better numerical stability if you use BCEWithLogitsLoss and remove the final Sigmoid layer.
Consider using the pos_weight argument passed to the constructor
of BCEWithLogitsLoss to compensate for the rarity of “1” pixels.
A typical value for pos_weight to reweight the rare values would be approx_number_of_1_pixels / approx_number_of_0_pixels.
(The value for pos_weight does not need to be especially precise.)
Note: You don’t need to use 0.5 as the threshold to convert predicted
probabilities (or 0.0 to threshold logits) to 0/1 values. You can lower
the threshold to predict more 1s (but using pos_weight is likely to be
the better approach).