Why bce loss out put differing

How pytorch calculates bce loss if output is bs,nc,h,w that is in case of segmentation
I obtained different results when I do
X= -Mask log( sigmoid (output) -(1-mask) log(1-sigmoid(1-output) )
X.mean(0).sum() ,
here it start from 3k ,ends in nan

And when I use f.bce with logits range of loss 0.5 to 1
Why is there a difference .m I missing any step for aggregating bce loss. There is no nan here .

Your formula might create Infs or NaNs, e.g. if you call torch.log on a zero or a negative input.
What’s the reason you are reimplementing the bce loss manually?

i figured out the NaN reason ,i was using mixed FP 1e-12 was rounding to zero so i changed that to 1e-7. that got resolved.

For the difference part
objective is to predict vehicle or object presence on road on certain spatial location coordinates from Driverless cars’ camera point of view.
While creating label i created Mask of shape X,Y whose non zero pixel corresponds to 2d pixel position in camera space projected from its 3d position on road .Basically model will detect car at position based on masks non zero pixel.
Custom one gives values in thousands but comes down fast ,standard one gives 0 to 1 and stagnates after 0.01.