Error usiing kl_divergence function in PyTorch 0.4.0

I’m trying to use the kl_divergence function with two multivariate normal distributions and I’m getting the following error:

RuntimeError: MAGMA potrf : A(4,4) is 0, A cannot be factorized

The code is this:

import torch
from torch.distributions.multivariate_normal import MultivariateNormal
from torch.distributions import kl_divergence

p = MultivariateNormal(torch.zeros(5).cuda(), torch.eye(5).cuda())
q = MultivariateNormal(torch.randn(1, 5).cuda(), torch.tril(torch.randn(5,5)).cuda())
kl_divergence(p,q)

Do you know why this is happening?

Thank you

Can you verify that your q's cov matrix is psd?

Note @gelarazi Please consider adding the appropriate imports to your code :slight_smile: (I tried running it, to see what happens, and then had to ponder figuring out what all the relevant imports are … ). Edit: oh, also, `#lower triangular 5x5 tensor# is a variable, not some inline commenting style I didnt know about before. I think it would be easier to reproduce if you created some random tensor (including seed), to demonstrate the issue with.

@hughperkins I just omitted the imports in the code snippet, of course I I added them; otherwise the Error would be import related :stuck_out_tongue:

@SimonW I’m doing torch.tril to the matrix before placing it there

Sure, but I didnt add them, and I’m too lazy to go off and add them. You are reducing the pool of potential people who might reply.

@hughperkins Ok, I’ve edited the post by adding the imports. Hope that helps

Ok. So, as Simon pointed out, your covariance matrix needs to be positive semi-definite. As the doc at https://pytorch.org/docs/stable/distributions.html#multivariatenormal points out, you can achieve that by multiplying your lower triangular matrix by its transpose:

import torch
from torch.distributions.multivariate_normal import MultivariateNormal
from torch.distributions import kl_divergence

# p = MultivariateNormal(torch.zeros(5).cuda(), torch.eye(5).cuda())
# q = MultivariateNormal(torch.randn(1, 5).cuda(), torch.tril(torch.randn(5,5)).cuda())
# kl_divergence(p,q)


p = MultivariateNormal(torch.zeros(5), torch.eye(5))
print('p.sample()', p.sample())

if False:
    q_mean = torch.randn(1, 5)
    q_cov = torch.tril(torch.randn(5,5))
    print('q_mean', q_mean)
    print('q_cov', q_cov)
    q = MultivariateNormal(q_mean, q_cov)
    print('q', q)

q_mean = torch.randn(1, 5)
L = torch.tril(torch.randn(5,5))
q_cov = L @ L.transpose(0, 1)
print('q_mean', q_mean)
print('q_cov', q_cov)
q = MultivariateNormal(q_mean, q_cov)
print('q.sample()', q.sample())

kl = kl_divergence(p,q)
print('kl', kl)

Output:

p.sample() tensor([ 0.1836, -0.6165,  0.7646, -0.9500, -1.9736])
q_mean tensor([[ 0.7356, -1.4405,  0.4172,  0.2697,  1.2461]])
q_cov tensor([[ 3.7545, -0.4939,  0.2997,  0.3735,  0.6791],
        [-0.4939,  0.1115,  0.4387, -0.1818,  0.0683],
        [ 0.2997,  0.4387,  7.0361, -2.1879,  2.3761],
        [ 0.3735, -0.1818, -2.1879,  1.1052, -0.5114],
        [ 0.6791,  0.0683,  2.3761, -0.5114,  1.1063]])
q.sample() tensor([[ 4.9458, -1.7962,  1.3363,  0.2006,  1.5021]])
kl tensor([ 117.6113])

Oh. Do you intend to use scale_tril= parameter?

If the latter, you can use that too, but, as the doc says, the lower triangular matrix needs to have positive diagonal elements:

import torch
from torch.distributions.multivariate_normal import MultivariateNormal
from torch.distributions import kl_divergence

p = MultivariateNormal(torch.zeros(5), torch.eye(5))
print('p.sample()', p.sample())

q_mean = torch.randn(1, 5)
print('q_mean', q_mean)
L = torch.tril(torch.randn(5,5).abs() + 0.1)
print('L', L)
q = MultivariateNormal(q_mean, scale_tril=L)
print('q.sample()', q.sample())

kl = kl_divergence(p,q)
print('kl', kl)

output:

p.sample() tensor([ 0.6368, -0.0909,  0.0822,  0.1273,  1.1169])
q_mean tensor([[ 1.1417, -0.2070, -1.2527,  0.4430, -1.5522]])
L tensor([[ 0.6695,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 1.4359,  0.7194,  0.0000,  0.0000,  0.0000],
        [ 1.4802,  0.7240,  0.3760,  0.0000,  0.0000],
        [ 1.2003,  0.3702,  0.6561,  1.0024,  0.0000],
        [ 1.4357,  0.4746,  0.1379,  1.3532,  1.9632]])
q.sample() tensor([[ 1.4201, -0.7672, -1.6811,  1.1081, -1.4706]])
kl tensor([ 30.6731])

(Yes, I’m learning as I go along too. That’s kind of the point of why answering questions is fun :slight_smile: )

@hughperkins Thank you for the answer and for pointing that out. I thought that the internal _scale_tril function employed by the kl_divergence functio does this job for us. :smiley:

1 Like