WERush
(Xinge)
April 22, 2018, 8:43am
#1
When I use the nn.KLDivLoss(), the KL gives the negative values. For example,

```
a1 = Variable(torch.FloatTensor([0.1,0.2]))
a2 = Variable(torch.FloatTensor([0.3, 0.6]))
a3 = Variable(torch.FloatTensor([0.3, 0.6]))
a4 = Variable(torch.FloatTensor([-0.3, -0.6]))
a5 = Variable(torch.FloatTensor([-0.3, -0.6]))
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4088
c2 = nn.KLDivLoss()(a2,a3) #==> -0.5588
c3 = nn.KLDivLoss()(a4,a5) #==> 0
c4 = nn.KLDivLoss()(a3,a4) #==> 0
c5 = nn.KLDivLoss()(a1,a4) #==> 0
```

In theory, the KLDloss should not be negative. Can anyone explain that?

Thanks.

agadetsky
(Artyom)
April 22, 2018, 11:17am
#2
your variables a1, a2, a3, a4, a5 are not proper distributions, i.e. they don’t sum up to 1.

WERush
(Xinge)
April 22, 2018, 11:55am
#3
```
a2 = Variable(torch.FloatTensor([0.3, 0.6, 0.1]))
a1 = Variable(torch.FloatTensor([0.4, 0.5, 0.1]))
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426
```

tom
(Thomas V)
April 22, 2018, 11:59am
#4
They still aren’t distributions.
Keep in mind that the loss functions take batches. So you’d want to unsqueeze(0).

Best regards

Thomas

WERush
(Xinge)
April 22, 2018, 12:02pm
#5
```
a1.unsqueeze_(0)
a2.unsqueeze_(0)
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426
```

tom
(Thomas V)
April 22, 2018, 1:07pm
#6
…and x (but not y) should be logprobs, say the doc. My apologies for missing that at first.

Best regards

Thomas

Edit : so back at the (pytorch 0.4) computer

```
import torch
a2 = torch.FloatTensor([0.3, 0.6, 0.1])
a1 = torch.FloatTensor([0.4, 0.5, 0.1])
c1 = torch.nn.KLDivLoss(size_average=False)(a1.log(),a2)
print(c1)
```

gives `2.3088`

, the same as explicitly computing

```
(a2*(a2.log()-a1.log())).sum()
```

Note that PyTorch erroneously averages the terms by default rather than summing them.

15 Likes

WERush
(Xinge)
April 23, 2018, 2:42am
#8
OK, I get it. Thanks very much.

Pei_Wang
(wynonna)
October 22, 2019, 3:55am
#9
I still get negative output

Same. I created a post over here since my negative KL divergence values might be caused by the output of torch.nn.functional.softmax producing vectors that don’t sum to 1.

I started receiving negative KL divergences between a target Dirichlet distribution and my model’s output Dirichlet distribution. Someone online suggested that this might be indicative that the parameters of the Dirichlet distribution don’t sum to 1. I thought this was ridiculous since the output of the model is passed through
output = F.softmax(self.weights(x), dim=1)
But after looking into it more closely, I found that torch.all(torch.sum(output, dim=1) == 1.) returns False! Looking at the …

MFajcik1
(Martin Fajčík)
January 20, 2020, 10:02am
#11
Did you normalized values with log_softmax?

```
torch.nn.KLDivLoss(size_average=False)(F.log_softmax(scores, -1), targets)
```

1 Like

Hi Tom. May I ask why those values are not stll distributions ? Thanks

tom
(Thomas V)
May 21, 2020, 11:42am
#13
Because they’re a batch of three things with values 0.3, 0.6, 0.1 individually for a2 rather than a single batch of three values.
Also, as noted in the post below (which has the correct use except for not using `torch.tensor`

as one would now), a1 should be passed as log probabilities.

Best regards

Thomas

Lin_Jia
(Lin Jia)
October 8, 2020, 12:11am
#14
I was calculating KL Divergence loss, and it is negative, that leads me to here. Thanks for all your previous answers.

This is the mathematical proof of why KLDLoss should be above zero:

The cornerstone of the proof is that for KLDLoss(p, q), sum(q) needs to equal one to make sure the loss is above zero. So even if you have p = log_softmax(tensor), you might still get negative values if your target is not a true distribution: sum(q) != 1

1 Like

I am now using pytorch 1.6. The result is 0.0231 instead of 2.3088.

tom
(Thomas V)
January 5, 2021, 7:47pm
#16
Yeah, I think I might have missed a e-2 in my post.

ube
(Mubarek Mohammed)
November 9, 2022, 12:24pm
#17
```
f=np.array([5.96157837e-23, 1.00000000e+00, 1.22259395e-20, 1.33735148e-23,
4.61867503e-18, 1.12406575e-20 ,1.61654594e-18, 4.05129215e-16,
4.42485638e-19 ,2.47655932e-18])
g=np.array([6.6305534e-17,1.0000000e+00 ,1.1786218e-13 ,2.3428323e-22, 5.9469584e-13,
5.7867524e-19, 5.0707482e-17, 4.4700832e-13, 1.3639888e-15, 1.2760710e-15])
div_measure= lambda a,b :
print(sum(f), sum(g))
print(np.multiply(f,np.log(np.divide(f,g))).sum())
```

This will output about -2.917560

bkoyuncu
(bk)
December 8, 2022, 11:53pm
#18
These are not clearly summing up to 1