Example implementation of a variational autoencoder

I am a bit unsure about the loss function in the example implementation of a VAE on GitHub.

The evidence lower bound (ELBO) can be summarized as:

ELBO = log-likelihood - KL Divergence

And in the context of a VAE, this should be maximized. However, since PyTorch only implements gradient descent, then the negative of this should be minimized instead:

-ELBO = KL Divergence - log-likelihood

However, in the loss function in the code, the loss is defined as:

Binary Cross Entropy + KL Divergence

According to the documentation for the BCE loss, it actually implements the negative log-likelihood function of a Bernoulli distribution, which means that:

BCE + KLD = KLD - log-likelihood

Which is the same as what was derived above. Is this why the loss is defined in this way in the code?

Also, does the cross-entropy loss function also implement a negative log-likelihood function?


I have the same problem, I don’t know which form is the most correct.

In my understanding, BCE implements negative log-likelihood for 2 classes, and CrossEntropy implements it for multiple classes.

Did you reach a conclusion about this problem? I am facing the same issue… thank you in advance!