Loss averaging in VAE example

emanjavacas · April 26, 2017, 8:36am

Are there any (theoretical) reasons for not taking the batch average loss in the VAE example?
Right now both the KL divergence and the BCE aren’t being averaged.

github.com

pytorch/examples/blob/master/vae/main.py#L84


    return self.decode(z), mu, logvar




model = VAE()
if args.cuda:
model.cuda()




def loss_function(recon_x, x, mu, logvar):
BCE = F.binary_cross_entropy(recon_x, x.view(-1, 784))


# see Appendix B from VAE paper:
# Kingma and Welling. Auto-Encoding Variational Bayes. ICLR, 2014
# https://arxiv.org/abs/1312.6114
# 0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
# Normalise by same number of elements as in reconstruction
KLD /= args.batch_size * 784


return BCE + KLD

smth · April 26, 2017, 11:46pm

I dont think there are strong theoretical reasons. Joost (original author of that code) was porting some code over exactly.

emanjavacas · May 5, 2017, 12:27pm

Alright, I was just wondering, thanks!