Error usiing kl_divergence function in PyTorch 0.4.0

gelazari · June 7, 2018, 5:38pm

I’m trying to use the kl_divergence function with two multivariate normal distributions and I’m getting the following error:

RuntimeError: MAGMA potrf : A(4,4) is 0, A cannot be factorized

The code is this:

import torch
from torch.distributions.multivariate_normal import MultivariateNormal
from torch.distributions import kl_divergence

p = MultivariateNormal(torch.zeros(5).cuda(), torch.eye(5).cuda())
q = MultivariateNormal(torch.randn(1, 5).cuda(), torch.tril(torch.randn(5,5)).cuda())
kl_divergence(p,q)

Do you know why this is happening?

Thank you

SimonW · June 7, 2018, 5:47pm

Can you verify that your q's cov matrix is psd?

hughperkins · June 8, 2018, 12:08am

Note @gelarazi Please consider adding the appropriate imports to your code (I tried running it, to see what happens, and then had to ponder figuring out what all the relevant imports are … ). Edit: oh, also, `#lower triangular 5x5 tensor# is a variable, not some inline commenting style I didnt know about before. I think it would be easier to reproduce if you created some random tensor (including seed), to demonstrate the issue with.

gelazari · June 8, 2018, 8:20am

@hughperkins I just omitted the imports in the code snippet, of course I I added them; otherwise the Error would be import related

gelazari · June 8, 2018, 8:21am

@SimonW I’m doing torch.tril to the matrix before placing it there

hughperkins · June 8, 2018, 8:37am

Sure, but I didnt add them, and I’m too lazy to go off and add them. You are reducing the pool of potential people who might reply.

gelazari · June 8, 2018, 9:52am

@hughperkins Ok, I’ve edited the post by adding the imports. Hope that helps

hughperkins · June 8, 2018, 11:56am

Ok. So, as Simon pointed out, your covariance matrix needs to be positive semi-definite. As the doc at https://pytorch.org/docs/stable/distributions.html#multivariatenormal points out, you can achieve that by multiplying your lower triangular matrix by its transpose:

import torch
from torch.distributions.multivariate_normal import MultivariateNormal
from torch.distributions import kl_divergence

# p = MultivariateNormal(torch.zeros(5).cuda(), torch.eye(5).cuda())
# q = MultivariateNormal(torch.randn(1, 5).cuda(), torch.tril(torch.randn(5,5)).cuda())
# kl_divergence(p,q)


p = MultivariateNormal(torch.zeros(5), torch.eye(5))
print('p.sample()', p.sample())

if False:
    q_mean = torch.randn(1, 5)
    q_cov = torch.tril(torch.randn(5,5))
    print('q_mean', q_mean)
    print('q_cov', q_cov)
    q = MultivariateNormal(q_mean, q_cov)
    print('q', q)

q_mean = torch.randn(1, 5)
L = torch.tril(torch.randn(5,5))
q_cov = L @ L.transpose(0, 1)
print('q_mean', q_mean)
print('q_cov', q_cov)
q = MultivariateNormal(q_mean, q_cov)
print('q.sample()', q.sample())

kl = kl_divergence(p,q)
print('kl', kl)

Output:

p.sample() tensor([ 0.1836, -0.6165,  0.7646, -0.9500, -1.9736])
q_mean tensor([[ 0.7356, -1.4405,  0.4172,  0.2697,  1.2461]])
q_cov tensor([[ 3.7545, -0.4939,  0.2997,  0.3735,  0.6791],
        [-0.4939,  0.1115,  0.4387, -0.1818,  0.0683],
        [ 0.2997,  0.4387,  7.0361, -2.1879,  2.3761],
        [ 0.3735, -0.1818, -2.1879,  1.1052, -0.5114],
        [ 0.6791,  0.0683,  2.3761, -0.5114,  1.1063]])
q.sample() tensor([[ 4.9458, -1.7962,  1.3363,  0.2006,  1.5021]])
kl tensor([ 117.6113])

hughperkins · June 8, 2018, 12:04pm

Oh. Do you intend to use scale_tril= parameter?

hughperkins · June 8, 2018, 12:08pm

If the latter, you can use that too, but, as the doc says, the lower triangular matrix needs to have positive diagonal elements:

import torch
from torch.distributions.multivariate_normal import MultivariateNormal
from torch.distributions import kl_divergence

p = MultivariateNormal(torch.zeros(5), torch.eye(5))
print('p.sample()', p.sample())

q_mean = torch.randn(1, 5)
print('q_mean', q_mean)
L = torch.tril(torch.randn(5,5).abs() + 0.1)
print('L', L)
q = MultivariateNormal(q_mean, scale_tril=L)
print('q.sample()', q.sample())

kl = kl_divergence(p,q)
print('kl', kl)

output:

p.sample() tensor([ 0.6368, -0.0909,  0.0822,  0.1273,  1.1169])
q_mean tensor([[ 1.1417, -0.2070, -1.2527,  0.4430, -1.5522]])
L tensor([[ 0.6695,  0.0000,  0.0000,  0.0000,  0.0000],
        [ 1.4359,  0.7194,  0.0000,  0.0000,  0.0000],
        [ 1.4802,  0.7240,  0.3760,  0.0000,  0.0000],
        [ 1.2003,  0.3702,  0.6561,  1.0024,  0.0000],
        [ 1.4357,  0.4746,  0.1379,  1.3532,  1.9632]])
q.sample() tensor([[ 1.4201, -0.7672, -1.6811,  1.1081, -1.4706]])
kl tensor([ 30.6731])

(Yes, I’m learning as I go along too. That’s kind of the point of why answering questions is fun )

gelazari · June 8, 2018, 12:18pm

@hughperkins Thank you for the answer and for pointing that out. I thought that the internal _scale_tril function employed by the kl_divergence functio does this job for us.