Example implementation of a variational autoencoder

Mahmoud_Abdelkhalek · November 19, 2020, 6:33pm

I am a bit unsure about the loss function in the example implementation of a VAE on GitHub.

The evidence lower bound (ELBO) can be summarized as:

ELBO = log-likelihood - KL Divergence

And in the context of a VAE, this should be maximized. However, since PyTorch only implements gradient descent, then the negative of this should be minimized instead:

-ELBO = KL Divergence - log-likelihood

However, in the loss function in the code, the loss is defined as:

Binary Cross Entropy + KL Divergence

According to the documentation for the BCE loss, it actually implements the negative log-likelihood function of a Bernoulli distribution, which means that:

BCE + KLD = KLD - log-likelihood

Which is the same as what was derived above. Is this why the loss is defined in this way in the code?

Also, does the cross-entropy loss function also implement a negative log-likelihood function?

InfT · November 19, 2021, 6:17pm

I have the same problem, I don’t know which form is the most correct.

InnovArul · November 19, 2021, 7:00pm

In my understanding, BCE implements negative log-likelihood for 2 classes, and CrossEntropy implements it for multiple classes.

franciscocms · January 26, 2022, 2:35pm

Did you reach a conclusion about this problem? I am facing the same issue… thank you in advance!