Expected behavior of a VAE trained on single example

A quick question. What is the expected behavior of a variational autoencoder that is trained on only a single example, over and over?
I thought I would use this as a sanity check on my architecture, but I’m not seeing the outcome I was expecting, and am wondering if I missed anything.

What I expect to happen is for the decoder to gradually learn to ignore the sampled latent, since there is no variation in the target - the latent is just noise, essentially. As this starts to occur, less and less of a gradient gets passed back through the encoder from the target, since the output is now more and more independent of the latent. Thus the only gradient affecting the encoder is that from the KL divergence, and the encoder starts to learn to output a Gaussian posterior independently of the (constant) input, and the KL loss approaches 0.

Is this reasoning correct?

I wasn’t planning on asking for help debugging my code, just my reasoning :slight_smile: - but if the above is correct, what general reasons could there be for the failure? A few possibilities come to mind: regularization losses that interfere with the decoder’s freedom to ignore the latent; lack of a bias term in the decoder; a too-high learning rate that prevents reaching the local optimum.