How to calculate the multivariate normal distribution using pytorch and math?

John_Price · December 12, 2020, 3:01pm

The mutivariate normal distribution is given as
multivariate_normal_distribution
The formula can be calculated using numpy for example the following way:

def multivariate_normal_distribution(x, d, mean, covariance):
  x_m = x - mean
  return (1.0 / (np.sqrt((2 * np.pi)**d * np.linalg.det(covariance))) * np.exp(-(np.linalg.solve(covariance, x_m).T.dot(x_m)) / 2))

I want to do the same calculation but instead of using numpy I want to use pytorch and math. The idea is the following:

def multivariate_normal_distribution(x, d, mean, covariance):
  x_m = x - mean
  return (1.0 / (math.sqrt((2 * math.pi)**d * torch.linalg.det(covariance))) * torch.exp(-(np.linalg.solve(covariance, x_m).T.dot(x_m)) / 2))

The problem is in np.linalg.solve(). I know that with .solve it is possible to solve a linear matrix equation or a system of linear scalar equations. So I want to know if there is a pytorch function that does exactly the same? Help is much appriciated

InnovArul · December 12, 2020, 4:16pm

Did you try this?

from  torch.distributions import multivariate_normal
dist = multivariate_normal.MultivariateNormal(loc=torch.zeros(5), covariance_matrix=torch.eye(5))
probability = torch.exp(dist.log_prob(torch.randn(5)))

John_Price · December 13, 2020, 8:27am

Thanks a lot, I will try that. But do you know if there is a function in pytorch which does the same like np.linalg.solve in the above formula? Thank you very much!

KFrank · December 13, 2020, 2:12pm

Hi John!

torch.solve is pytorch’s general linear-equation solver.

Your covariance should be symmetric positive definite, so, if you
cared, you could go the Cholesky- factorization route and use:

torch.cholesky
torch.cholesky_solve

(You could also pre-compute the covariance’s inverse with
torch.inverse or torch.cholesky_inverse.)

But unless you have an unusual use case, I would go with Arul’s
suggestion and use
torch.distributions.multivariate_normal.MultivariateNormal.

Best.

K. Frank

John_Price · December 16, 2020, 3:03pm

John_Price:

def multivariate_normal_distribution(x, d, mean, covariance):
  x_m = x - mean
  return (1.0 / (math.sqrt((2 * math.pi)**d * torch.linalg.det(covariance))) * torch.exp(-(np.linalg.solve(covariance, x_m).T.dot(x_m)) / 2))

Hi @KFrank

so you mean, that it I have given a symmetric positive definite covariance, I could also write:

torch.exp(-(torch.cholesky_solve(covariance, x_m).T.dot(x_m)))

for the right term with the exponential?

Thank you very much for your help!

John_Price · December 16, 2020, 5:46pm

@InnovArul I have got still one question. Is the part for the Mahalanobis-distance in the formula you wrote:
dist = multivariate_normal.MultivariateNormal(loc=torch.zeros(5), covariance_matrix=torch.eye(5))
the same as
m_distance

because in literature the Mahalanobis-distance is given with square root instead of -0.5 as a factor

KFrank · December 16, 2020, 8:10pm

Hi John!

Basically, but torch.cholesky_solve() takes the Cholesky factor matrix,
so you have to do it in two steps.

Here is an illustrative script:

import torch
torch.__version__

_ = torch.random.manual_seed (2020)

ma = torch.randn ((5, 5))
mspd = ma @ ma.T                           # make an SPD matrix
v = torch.randn ((5, 1))                   # make a vector
solveA = torch.solve (v, mspd)[0]          # solve direct
mcho = torch.cholesky (mspd)               # Cholesky factorization
solveB = torch.cholesky_solve (v, mcho)    # solve with Cholesky factorization
print ('solveA =', solveA)
print ('solveB =', solveB)
print ('solveA - solveB =', solveA - solveB)

And here is its output:

>>> import torch
>>> torch.__version__
'1.6.0'
>>> _ = torch.random.manual_seed (2020)
>>> ma = torch.randn ((5, 5))
>>> mspd = ma @ ma.T                           # make an SPD matrix
>>> v = torch.randn ((5, 1))                   # make a vector
>>> solveA = torch.solve (v, mspd)[0]          # solve direct
>>> mcho = torch.cholesky (mspd)               # Cholesky factorization
>>> solveB = torch.cholesky_solve (v, mcho)    # solve with Cholesky factorization
>>> print ('solveA =', solveA)
solveA = tensor([[ 6.6744],
        [-1.5211],
        [ 1.1208],
        [ 1.1884],
        [-3.9746]])
>>> print ('solveB =', solveB)
solveB = tensor([[ 6.6743],
        [-1.5211],
        [ 1.1208],
        [ 1.1884],
        [-3.9745]])
>>> print ('solveA - solveB =', solveA - solveB)
solveA - solveB = tensor([[ 1.7643e-05],
        [-4.2915e-06],
        [ 2.1458e-06],
        [ 3.0994e-06],
        [-9.2983e-06]])

Best.

K. Frank

InnovArul · December 16, 2020, 8:16pm

You can take a look at the source code here if it helps:

They calculate squared Mahalanobis distance in _batch_mahalanobis().

John_Price · December 16, 2020, 9:43pm

@KFrank I should have read your last post a little closer because you wrote “Cholesky- factorization route” and added both functions. Thank you for the helpful code snippet

So in my case I just have to write:

tmp = torch.cholesky(covariance)
res = torch.cholesky_solve(x_m, tmp)

Does it matter if cholesky_solve() takes first the vector and then the matrix or the other way round? And what if x_m is a distance matrix where the entries are eucledian distances? I know that Mahalanobis-distances are there to calculate the distances between a data point and a distribution
but is it desired that we first calculate a Euclidean distance with x_m and then subsequently calculate the Mahalanobis distance?

fbordignon · October 27, 2022, 4:00pm

Hey guys, thanks for the replies. In my case I have a single covariance matrix of a neighborhood (kernel) of a given pixel of an image and I need to compute the log_prob for all the pixels in the image using the neighborhood for them. i.e. for each pixel i,j I have to use a fixed and previously computed covariance matrix and mean to compute the multivariate normal log_prob that uses the neighbor pixels [i±k,j±k] being k the kernel size. One approach is to use torch.tensor.unfold to generate a view of the image tensor with the windows matching the kernel size, reshape and pass it to log_prob from the Multivariate normal. This approach works but it uses a lot of memory. Would anyone know a faster/better way to do that?