Loss function ("reduction=sum") get better results than (reduction="mean")

Hi, I found a strange phenomenon and did not know the reason.

I am training a deep denoising autoencoder, which tries to recover the original image using a noisy version as the input. It should not be a difficult task. I found when setting “reduction=sum”, then I can get the expected result, though the final loss is not small enough. But if I use the default setting “reduction=mean”, then I fail to get a satisfactory result. In this situation, the autoencoder seems to have learned nothing, i.e., it just outputs the noisy version of the input (like copy operation) instead of recovering an intact one.

Any explanation and suggestion would be appreciated.

The difference in the loss reduction would create different loss ranges and thus also impact the gradient magnitudes, so you could also play around with the learning rate and try to find a sweet spot.