I am training a VAE and want the prior distribution to have non-unit variance, but still be diagonal.
I know that there is this equation:
And for diagonal matrices this simplifies to:
0.5[ \sum_i \log \simga_2^(i) - \sum_i \log \simga_1^(i) + \sum_i \frac{\simga_2^(i)}{\simga_1^(i)} + \sum_i \frac{(\mu_2^(i) = \mu_1^(i))^2)} { \sigma_2^(i)} ]
where \simga_1 and \sigma_2 are diagonal elements of \Simga_1 and \Sigma_2 respectively.
Implementing this naively as:
KL = 0.5 * torch.sum(logVar2 - logVar + torch.exp(logVar) + (mu ** 2 / 2 * sigma2 ** 2) - 0.5)
where the re-param is done like this:
mu(x) + sigma(x) * eps , eps ~ N(0,I)
Is there a more efficient way to do this? Computation time has gone up significantly compared to using the regular KL loss.
Would it make sense to change the re-param like this:
mu(x) + sigma(x) * eps, eps ~ N(0,\Simga_2) ? and use the KL loss that is normally used for unit variance?
Thanks in advanced.