VAE KL divergence is not minimized

I am trying to train a VAE in pytorch. I followed the following tutorial:


But I noticed that the KL actually increases instead of decreasing.

My impression was that KL needs to go down similar to BCE during the training. Am I missing something here?