I am a bit unsure about the loss function in the example implementation of a VAE on GitHub.
The evidence lower bound (ELBO) can be summarized as:
ELBO = log-likelihood - KL Divergence
And in the context of a VAE, this should be maximized. However, since PyTorch only implements gradient descent, then the negative of this should be minimized instead:
-ELBO = KL Divergence - log-likelihood
However, in the loss function in the code, the loss is defined as:
Binary Cross Entropy + KL Divergence
According to the documentation for the BCE loss, it actually implements the negative log-likelihood function of a Bernoulli distribution, which means that:
BCE + KLD = KLD - log-likelihood
Which is the same as what was derived above. Is this why the loss is defined in this way in the code?
Also, does the cross-entropy loss function also implement a negative log-likelihood function?