RuntimeError: cholesky_cuda: For batch 0: U(16,16) is zero, singular U

I am trying to find KL divergence between two diagonal Gaussian distribution lets say N1(prior) and N2(posterior). In code, this looks something like`

_Σ = T.bmm(I, ((_σ).permute(1, 2, 0).clone().repeat(1, 1, I.size()[1]))) + I
_Σ_pr = T.bmm(I, ((_σ_pr).permute(1, 2, 0).clone().repeat(1, 1, I.size()[1]))) + I
Dkl = kl_divergence(MultivariateNormal(_μ.squeeze(), _Σ.squeeze()),MultivariateNormal(_μ_pr.squeeze(), _Σ_pr.squeeze()))

Where I is an identity matrix with dim as [batch_size, m, m](m is the dimension of Gaussian distribution)(and I[x,:, :] = identity, for all x) and _σ, _σ_pr and _μ, _μ_pr are the variance and mean of diagonal Gaussians. _Σ and _Σ_pr are covariance matrix with diagonal elements as the variance of corresponding Gaussians. I am trying to find KL divergence of a batch at once rather than running a for loop to avoid this error. Is this because one of the diagonal elements is zero in covariance matrix so thus when PyTorch tries to find the inverse of it, it’s getting an error. If so then I also tried to add a number so that I can make that zero diagonal elements non-zero.
I will be really grateful if someone gives me more efficient or working way of doing this.
PS : I am using Google Colab

The problem was actually with using torch.bmm in order to make diagonal covariance matrix, to be precise it wasn’t working the way I assumed it to worked so instead of using torch.bmm, I used torch.diag_embed()