BCE loss reduction

I have this VAE model where I use BCE loss, when I use reduction=sum the loss is really big , but the results are good. When I use the default reduction, the loss is nearly 0 and the model doesn’t learn.
I don’t get how the both losses are related, I use a batch size of 64 and a dataset with 190000 images.
I wanted to compare both, but I can’t see a connection. I printed both out after the first batch, the summed loss is 2186383.5 and the element wise mean loss 0.695.

Well, there should be a connection between the summed and averaged loss.
If you try to divide the summed loss by the averaged one and the batch size, it looks like your image sizes are approx. 224x224. Would that be a correct guess?

However, back to your original questions. If your model learns fine with a huge loss and doesn’t learn at all with the averaged loss, you could try to increase the learning rate and see, if you can match both training behaviors.

My image size is 128x128 with 3 colour channels. I tried dividing but I couldn’t figure it out. I will try that, thank you.

Yeah, fits even a bit better:

2186383.5 / 64. / 3. / 128**2 ~= 0.69503

Ok now I understand what elementwise-mean means. Thank you very much.