If I understand correctly - BCEWithLogitsLoss() may be more appropriate for your problem.
Assuming that you did not applied any activation function at your last conv-1x1-layer. Therefore, you need to pass the output through a “Sigmoid” layer to convert it to a map that has value ranges between 0 and 1 (similar to the range of your y-label 0:background and 1:segmentation-mask). With that being said, BCEWithLogitsLoss() is a natural choice for your application because it applies a Sigmoid function to the output before calculating cross entropy loss.
Hope that help.