Relationship of batch size and Variational AutoEncoder training

I’ve been attempting to implement a Variational AutoEncoder, and my test example (MNIST) works quite well. However, my actual data is rather memory intensive and I’m required to limit the batch size to something like 5 images.

I’m wondering if the smaller batch size has any effect when computing the KL_Loss. Since, as I understand it, that loss on the latent vector space is trying to insure values are spread across the Gaussian domain. Is it possible that for a VAE to be adequately trained I would need a reasonably large batch size?

The “reparameterization” is adding Gaussian noise to the latent vector and I’m wondering if this has a detrimental effect with a small batch size.