KL Divergence produces negative values

When I use the nn.KLDivLoss(), the KL gives the negative values. For example,

a1 = Variable(torch.FloatTensor([0.1,0.2]))
a2 = Variable(torch.FloatTensor([0.3, 0.6]))
a3 = Variable(torch.FloatTensor([0.3, 0.6]))
a4 = Variable(torch.FloatTensor([-0.3, -0.6]))
a5 = Variable(torch.FloatTensor([-0.3, -0.6]))

c1 = nn.KLDivLoss()(a1,a2) #==> -0.4088
c2 = nn.KLDivLoss()(a2,a3) #==> -0.5588
c3 = nn.KLDivLoss()(a4,a5) #==> 0
c4 = nn.KLDivLoss()(a3,a4) #==> 0
c5 = nn.KLDivLoss()(a1,a4) #==> 0

In theory, the KLDloss should not be negative. Can anyone explain that?

Thanks.

your variables a1, a2, a3, a4, a5 are not proper distributions, i.e. they don’t sum up to 1.

a2 = Variable(torch.FloatTensor([0.3, 0.6, 0.1]))
a1 = Variable(torch.FloatTensor([0.4, 0.5, 0.1]))

c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426

They still aren’t distributions. :slight_smile:
Keep in mind that the loss functions take batches. So you’d want to unsqueeze(0).

Best regards

Thomas

a1.unsqueeze_(0)
a2.unsqueeze_(0)

c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426 

…and x (but not y) should be logprobs, say the doc. My apologies for missing that at first.

Best regards

Thomas

Edit: so back at the (pytorch 0.4) computer

import torch
a2 = torch.FloatTensor([0.3, 0.6, 0.1])
a1 = torch.FloatTensor([0.4, 0.5, 0.1])
c1 = torch.nn.KLDivLoss(size_average=False)(a1.log(),a2)
print(c1)

gives 2.3088, the same as explicitly computing

(a2*(a2.log()-a1.log())).sum()

Note that PyTorch erroneously averages the terms by default rather than summing them.

16 Likes

OK, I get it. Thanks very much.

I still get negative output

Same. I created a post over here since my negative KL divergence values might be caused by the output of torch.nn.functional.softmax producing vectors that don’t sum to 1.

Did you normalized values with log_softmax?

torch.nn.KLDivLoss(size_average=False)(F.log_softmax(scores, -1), targets)
1 Like

Hi Tom. May I ask why those values are not stll distributions ? Thanks

Because they’re a batch of three things with values 0.3, 0.6, 0.1 individually for a2 rather than a single batch of three values.
Also, as noted in the post below (which has the correct use except for not using torch.tensor as one would now), a1 should be passed as log probabilities.

Best regards

Thomas

I was calculating KL Divergence loss, and it is negative, that leads me to here. Thanks for all your previous answers.

This is the mathematical proof of why KLDLoss should be above zero:

The cornerstone of the proof is that for KLDLoss(p, q), sum(q) needs to equal one to make sure the loss is above zero. So even if you have p = log_softmax(tensor), you might still get negative values if your target is not a true distribution: sum(q) != 1

2 Likes

I am now using pytorch 1.6. The result is 0.0231 instead of 2.3088.

Yeah, I think I might have missed a e-2 in my post.

f=np.array([5.96157837e-23, 1.00000000e+00, 1.22259395e-20, 1.33735148e-23,
 4.61867503e-18, 1.12406575e-20 ,1.61654594e-18, 4.05129215e-16,
 4.42485638e-19 ,2.47655932e-18])
g=np.array([6.6305534e-17,1.0000000e+00 ,1.1786218e-13 ,2.3428323e-22, 5.9469584e-13,
            5.7867524e-19, 5.0707482e-17, 4.4700832e-13, 1.3639888e-15, 1.2760710e-15])


div_measure= lambda a,b :
print(sum(f), sum(g))
print(np.multiply(f,np.log(np.divide(f,g))).sum())

This will output about -2.917560

These are not clearly summing up to 1