Possibly related, since OP is (was) trying to simultaneously learn mean and variance:
- machine learning - Is the optimization of the Gaussian VAE well-posed? - Cross Validated
- “Unfortunately, joint optimization of both mean and variance in the VAE leads to the well-known problem of shrinkage or underestimation of variance.” [2010.09042] Addressing Variance Shrinkage in Variational Autoencoders using Quantile Regression