emanjavacas
(Enrique Manjavacas)
April 26, 2017, 8:36am
#1
Are there any (theoretical) reasons for not taking the batch average loss in the VAE example?
Right now both the KL divergence and the BCE aren’t being averaged.

return self.decode(z), mu, logvar
model = VAE()
if args.cuda:
model.cuda()
def loss_function(recon_x, x, mu, logvar):
BCE = F.binary_cross_entropy(recon_x, x.view(-1, 784))
# see Appendix B from VAE paper:
# Kingma and Welling. Auto-Encoding Variational Bayes. ICLR, 2014
# https://arxiv.org/abs/1312.6114
# 0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
# Normalise by same number of elements as in reconstruction
KLD /= args.batch_size * 784
return BCE + KLD

smth
April 26, 2017, 11:46pm
#2
I dont think there are strong theoretical reasons. Joost (original author of that code) was porting some code over exactly.

emanjavacas
(Enrique Manjavacas)
May 5, 2017, 12:27pm
#3
Alright, I was just wondering, thanks!