Use KL divergence as loss between two multivariate Gaussians

rasoolfa · March 25, 2019, 10:44pm

Hi,

I want to use KL divergence as loss function between two multivariate Gaussians. Is the following right way to do it?

mu1 = torch.rand((B, D), requires_grad=True)
std1 = torch.rand((B, D), requires_grad=True)
p = torch.distributions.Normal(mu1, std1)
mu2 = torch.rand((B, D))
std2 = torch.rand((B, D))
q = torch.distributions.Normal(mu2, std2)

loss = torch.distributions.kl_divergence(p, q).mean()
loss.backward()
…

My understanding is that torch.distributions.kl_divergence computes kl(p,q) like derivations in section 9 of this document.

Rojin · August 13, 2019, 10:29pm

any update on this question?

Nikronic · August 13, 2019, 11:02pm

Hi,

You are right. When you are using distributions from torch.distribution package, you are doing fine by using torch.distribution.kl_divergence. But if you want to get kl by passing two tensors obtain elsewhere, you can do following approach:

@Rojin I have posted this on your thread actually.

This is the kl between two arbitrary layers.
Just be aware that the input a must should contain log-probabilities and the target b should contain probability.

github.com

pytorch/pytorch/blob/35fed93b1ef05175143f883c6f89f06c6dd9429b/aten/src/ATen/native/Loss.cpp#L71


      
            return apply_loss_reduction(output, reduction);
          }
          
          Tensor margin_ranking_loss(const Tensor& input1, const Tensor& input2, const Tensor& target, double margin, int64_t reduction) {
            auto output =  (-target * (input1 - input2) + margin).clamp_min_(0);
            return apply_loss_reduction(output, reduction);
          }
          
          Tensor kl_div(const Tensor& input, const Tensor& target, int64_t reduction) {
            auto zeros = at::zeros_like(target);
            auto output_pos = target * (at::log(target) - input);
            auto output = at::where(target > 0, output_pos, zeros);
            return apply_loss_reduction(output, reduction);
          }
          
          Tensor kl_div_backward_cpu(const Tensor& grad, const Tensor& input, const Tensor& target, int64_t reduction) {
            auto grad_input = at::zeros_like(input);
            auto grad_expand = grad.expand_as(input);
            AT_DISPATCH_FLOATING_TYPES(input.scalar_type(), "kl_div_backward_cpu", [&]() {
              at::CPU_tensor_apply3<scalar_t, scalar_t, scalar_t>(
                  grad_input,

https://pytorch.org/docs/stable/nn.functional.html?highlight=kl_div#kl-div

By the way, PyTorch use this approach:

https://pytorch.org/docs/stable/distributions.html?highlight=kl_div#torch.distributions.kl.kl_divergence

Good luck
Nik

LiUzHiAn · December 11, 2019, 11:13am

Hi，
I test the kl_divergence as follows, the tensors’ shapes are [batch_size,n], where n is my gaussian distribution dimension (i.e. 3 here),

        mu1 = torch.Tensor([[1., 2., 3.],
						[2., 3., 4.]])
        var_1 = torch.Tensor([[1., 1., 1.],
						  [4., 9., 16.]])

	mu2 = torch.Tensor([[1., 3., 4.],
						[2., 3., 4.]])
	var_2 = torch.Tensor([[1., 4., 9.],
						  [4., 9., 16.]])

	p = torch.distributions.Normal(mu1, var_1)
	q = torch.distributions.Normal(mu2, var_2)
	kl_loss = torch.distributions.kl_divergence(p, q)

	print(kl_loss)

The outpur is :

tensor([[0.6542, 0.9488, 1.7096],
        [0.0000, 0.0000, 0.0000]])

Why 3 numbers there for each instance? In my thought, the case above contains two pairs (p,q) , each pair should have a scalar KL loss.

Any ideas? Thank you

LiUzHiAn · December 11, 2019, 11:30am

Ooh, I got it !

In my above case, I assume the distribution to be multivariate gaussian, I should use torch.distributions.MultivariateNormal(mu1, var_1) instead, which meets my hope.

Michael-J98 · July 19, 2020, 1:37am

Hi,
can I ask a question that why you set requires_grad = True in mu1&std1 while mu2&std2 did not?
Is there something we should watch out?

thomas_liu · August 4, 2020, 10:22am

Hi,
I run into a problem where MultivariateNormal does not allow [batch ,n] shape because batch and n has to be equal, Do you manage to solve this?

drmeerkatli · August 20, 2020, 7:16pm

Does the KL calculation correct?
log(a/b) = log(a) - b???