VAE reconstruction loss

Rojin · September 4, 2019, 5:23pm

I have seen people writing the reconstruction loss in two different ways:

F.binary_cross_entropy(recon_x1, x1.view(-1, 784))

or 

F.binary_cross_entropy(recon_x1, x1.view(-1, 784), reduction = "sum")

I was wondering if there is a theoretical reason to use one over the other?

spanev · September 4, 2019, 9:37pm

Hi @Rojin

I believe this comes from the fact that the KL divergence is an integral/sum.

So the sum reduction would be the more paper faithful approach, but having the PyTorch default (mean) will still work (only have downscaled gradients).