Implementing The Diagonal Multivariate Gaussian Distribution

Hello there,

I need to use - or implement - a means of calculating the probability density function of a diagonal, multivariate Gaussian distribution.

The context is that I am building an “Active Inference” agent based model to solve a version of the CartPole-v1 environment: Cart Pole - Gymnasium Documentation

The agent receives a noisy observation vector and must operate on the basis of this noisy/imperfect observation.

I will omit the details as to what my method actually is, I’ll only say that it necessitates the ability to calculate the probability density function (pdf) of a diagonal multivariate Gaussian distribution. I know that PyTorch offers this functionality out of the box with the MultivariateNormal class: Probability distributions - torch.distributions — PyTorch 2.0 documentation. Indeed, I have attempted to use this class, in addition to my own “from scratch” implementation, but I get the same problem in any case.

After importing:

import torch
import torch.distributions.multivariate_normal as mvn
import numpy as np

My implementation is as follows - note that the print statements are for debugging only:

def diag_mvg_pdf_torch(samples, means, diag_vars):
    """
    Compute the PDF of a diagonal multivariate Gaussian distribution using PyTorch.

    Parameters:
    - samples (torch.Tensor): The samples tensor (batch_size x num_features).
    - means (torch.Tensor): The means tensor (batch_size x num_features).
    - diag_vars (torch.Tensor): The diagonal elements of the covariance matrix (batch_size x num_features).

    Returns:
    - pdf (torch.Tensor): The average PDF value over the batch.
    """
    batch_size = len(samples)

    final_pdf = 0

    for i in range(batch_size):
        mvn_dist_i = mvn.MultivariateNormal(
            loc=means[i],
            covariance_matrix=torch.diag(diag_vars[i])
        )

        pdf_i = torch.exp(mvn_dist_i.log_prob(samples[i]))
        final_pdf += pdf_i

        print(f"\npdf_i: {pdf_i}")
        print(f"samples[i]: {samples[i]}")
        print(f"means[i]: {means[i]}")
        print(f"diag_vars[i]: {diag_vars[i]}\n")

    # Compute the average PDF over the batch
    pdf = final_pdf / batch_size

    return pdf

This function is supposed to instantiate a multivariate Gaussian pdf, parameterised by the mean vector and variances in each row of “means” and “diag_vars” (respectively). The pdf value of each respective sample in “samples” is computed and the average value of these pdfs should be returned.

The batch size is 64, so this code should instantiate 64 diagonal multivariate Gaussians, compute the pdf of each sample with respect to exactly one, unique, instantiated Gaussian and then return the average of these pdf values.

My “samples”, “means” and “diag_vars” tensors look like this - I’ve concatenated their full extent to save space:

# reparameterized samples
samples = torch.tensor([
                [ 0.0317,  0.0163, -0.0062,  0.0694],
                [ 0.1244, -0.1245,  0.0240,  0.0954],
               ...
                [-0.1707,  0.4822,  0.0498,  0.1098],
                [ 0.1741,  0.0472,  0.1415,  0.1108]
        ])

        # predicted mean vectors
        means = torch.tensor([
                [-0.0248,  0.1706,  0.0314,  0.1002],
                [-0.0248,  0.1706,  0.0314,  0.1002],
                ...
                [ 0.0237,  0.1656,  0.0014,  0.1111],
                [ 0.0237,  0.1656,  0.0014,  0.1111]
        ])

        # predicted variances 
        diag_vars = torch.tensor([
                [8.1372e-03, 2.6815e-02, 3.7143e-02, 1.4694e-03],
                [8.1372e-03, 2.6815e-02, 3.7143e-02, 1.4694e-03],
                ...
                [1.7298e-02, 6.7676e-02, 3.3873e-02, 8.5967e-07],
                [1.7298e-02, 6.7676e-02, 3.3873e-02, 8.5967e-07]
        ])

Now the output of this following code:

pdf = diag_mvg_pdf_torch(samples, means, diag_vars)
print(f"\npdf: {pdf}\n")

is this:

...
pdf_i: 1966.2965087890625
samples[i]: tensor([0.0185, 0.1974, 0.2239, 0.1114])
means[i]: tensor([0.0237, 0.1656, 0.0014, 0.1111])
diag_vars[i]: tensor([1.7298e-02, 6.7676e-02, 3.3873e-02, 8.5967e-07])


pdf_i: 39.062076568603516
samples[i]: tensor([ 0.1559,  0.2052, -0.1978,  0.1238])
means[i]: tensor([0.0253, 0.1559, 0.0055, 0.1171])
diag_vars[i]: tensor([2.0956e-02, 6.7220e-02, 2.8870e-02, 9.5371e-06])


pdf_i: 250.84556579589844
samples[i]: tensor([-0.1707,  0.4822,  0.0498,  0.1098])
means[i]: tensor([0.0237, 0.1656, 0.0014, 0.1111])
diag_vars[i]: tensor([1.7298e-02, 6.7676e-02, 3.3873e-02, 8.5967e-07])


pdf_i: 1444.880859375
samples[i]: tensor([0.1741, 0.0472, 0.1415, 0.1108])
means[i]: tensor([0.0237, 0.1656, 0.0014, 0.1111])
diag_vars[i]: tensor([1.7298e-02, 6.7676e-02, 3.3873e-02, 8.5967e-07])


pdf: 485.08538818359375

where this is only the last few lines of the output.

Clearly this is not working correctly, since the final “averaged” pdf value is about 485. Indeed, each of the 64 “pdf_i” values are usually vastly larger than 1.0.

I have tried several things, like implementing the pdf calculation “from scratch”:

def diag_mvg_pdf_from_scratch(samples, means, diag_vars):
    """
    Compute the PDF of a diagonal multivariate Gaussian distribution using PyTorch.

    Parameters:
    - samples (torch.Tensor): The samples tensor (batch_size x num_features).
    - means (torch.Tensor): The means tensor (batch_size x num_features).
    - diag_vars (torch.Tensor): The diagonal elements of the covariance matrix (batch_size x num_features).

    Returns:
    - pdf (torch.Tensor): The average PDF value over the batch.
    """
    batch_size = len(samples)

    final_pdf = 0

    for i in range(batch_size):

        exponent = (-(samples[i] - means[i]) ** 2)/(2 * diag_vars[i])
        normalization_term = (2*np.pi*diag_vars[i]) ** (-1/2)
        pdf_i_dims = normalization_term * torch.exp(exponent)
        pdf_i = torch.prod(pdf_i_dims)

        print(f"\nexponent: {exponent}")
        print(f"normalization_term: {normalization_term}")
        print(f"pdf_i_dims: {pdf_i_dims}")
        print(f"pdf_i: {pdf_i}")
        print(f"samples[i]: {samples[i]}")
        print(f"means[i]: {means[i]}")
        print(f"diag_vars[i]: {diag_vars[i]}\n")

    # Compute the average PDF over the batch
    pdf = final_pdf / batch_size

    return pdf

but this has exactly the same result.

I suspect that the issue will have to do with the “samples”, “means” and “log_vars” themselves, since “means” and “log_vars” are actually predicted by a feed-forward neural network. “samples” are samples drawn from the this distribution, by means of the “reparameterization trick”.

I would appreciate any thoughts/assistance that anyone might have to offer. I’m very happy to elaborate on anything that you desire.

Many thanks for reading this far!

Hi Ronnie!

Your probability-density-function implementation is correct.

Your “from scratch” implementation is also correct and is a successful
cross-check on your first implementation.

You haven’t said exact;ly why you think this is wrong, but is seems that you
are surprised by the “large” value of the pdf.

In this case, I think that your intuition is misleading you. As the variance
(or, similarly, the standard deviation) of a gaussian becomes small, the peak
value of that gaussian’s pdf becomes large. This effect is enhanced in multiple
dimensions because the peak value of the multidimensional pdf is (in a
well-defined sense) the product of the one-dimensional peaks. And your
variances are small.

Consider:

>>> import torch
>>> torch.__version__
'2.0.1'
>>> import math
>>> diag_vars = torch.tensor ([1.7298e-02, 6.7676e-02, 3.3873e-02, 8.5967e-07])
>>> peak_pdf = (1.0 / (2.0 * math.pi * diag_vars)**0.5).prod()
>>> peak_pdf
tensor(4338.4268)

So for this particular example, the pdf you calculate – 1966.2965 – is less than
half of the peak value (which would be obtained if your samples were equal to
your means), which is to say that it’s not unreasonably large.

Best.

K. Frank

Thank you very much @KFrank

Apologies for the late reply.

Yes I see now that I have just confused probability with probability density value. I have now fixed the issue. I didn’t actually need the pdf value at all in the end, to do what I wanted.

Thank you for your time!