KL Divergence produces negative values

WERush · April 22, 2018, 8:43am

When I use the nn.KLDivLoss(), the KL gives the negative values. For example,

a1 = Variable(torch.FloatTensor([0.1,0.2]))
a2 = Variable(torch.FloatTensor([0.3, 0.6]))
a3 = Variable(torch.FloatTensor([0.3, 0.6]))
a4 = Variable(torch.FloatTensor([-0.3, -0.6]))
a5 = Variable(torch.FloatTensor([-0.3, -0.6]))

c1 = nn.KLDivLoss()(a1,a2) #==> -0.4088
c2 = nn.KLDivLoss()(a2,a3) #==> -0.5588
c3 = nn.KLDivLoss()(a4,a5) #==> 0
c4 = nn.KLDivLoss()(a3,a4) #==> 0
c5 = nn.KLDivLoss()(a1,a4) #==> 0

In theory, the KLDloss should not be negative. Can anyone explain that?

Thanks.

agadetsky · April 22, 2018, 11:17am

your variables a1, a2, a3, a4, a5 are not proper distributions, i.e. they don’t sum up to 1.

WERush · April 22, 2018, 11:55am

a2 = Variable(torch.FloatTensor([0.3, 0.6, 0.1]))
a1 = Variable(torch.FloatTensor([0.4, 0.5, 0.1]))

c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426

tom · April 22, 2018, 11:59am

They still aren’t distributions.
Keep in mind that the loss functions take batches. So you’d want to unsqueeze(0).

Best regards

Thomas

WERush · April 22, 2018, 12:02pm

a1.unsqueeze_(0)
a2.unsqueeze_(0)

c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426

tom · April 22, 2018, 1:07pm

…and x (but not y) should be logprobs, say the doc. My apologies for missing that at first.

Best regards

Thomas

Edit: so back at the (pytorch 0.4) computer

import torch
a2 = torch.FloatTensor([0.3, 0.6, 0.1])
a1 = torch.FloatTensor([0.4, 0.5, 0.1])
c1 = torch.nn.KLDivLoss(size_average=False)(a1.log(),a2)
print(c1)

gives 2.3088, the same as explicitly computing

(a2*(a2.log()-a1.log())).sum()

Note that PyTorch erroneously averages the terms by default rather than summing them.

WERush · April 23, 2018, 2:42am

OK, I get it. Thanks very much.

Pei_Wang · October 22, 2019, 3:55am

I still get negative output

RylanSchaeffer · October 29, 2019, 9:56pm

Same. I created a post over here since my negative KL divergence values might be caused by the output of torch.nn.functional.softmax producing vectors that don’t sum to 1.

MFajcik1 · January 20, 2020, 10:02am

Did you normalized values with log_softmax?

torch.nn.KLDivLoss(size_average=False)(F.log_softmax(scores, -1), targets)

William_Correa · May 20, 2020, 7:41pm

Hi Tom. May I ask why those values are not stll distributions ? Thanks

tom · May 21, 2020, 11:42am

Because they’re a batch of three things with values 0.3, 0.6, 0.1 individually for a2 rather than a single batch of three values.
Also, as noted in the post below (which has the correct use except for not using torch.tensor as one would now), a1 should be passed as log probabilities.

Best regards

Thomas

Lin_Jia · October 8, 2020, 12:11am

I was calculating KL Divergence loss, and it is negative, that leads me to here. Thanks for all your previous answers.

This is the mathematical proof of why KLDLoss should be above zero:

The cornerstone of the proof is that for KLDLoss(p, q), sum(q) needs to equal one to make sure the loss is above zero. So even if you have p = log_softmax(tensor), you might still get negative values if your target is not a true distribution: sum(q) != 1

kinwai_cheuk · January 5, 2021, 2:56am

I am now using pytorch 1.6. The result is 0.0231 instead of 2.3088.

tom · January 5, 2021, 7:47pm

Yeah, I think I might have missed a e-2 in my post.

ube · November 9, 2022, 12:24pm

f=np.array([5.96157837e-23, 1.00000000e+00, 1.22259395e-20, 1.33735148e-23,
 4.61867503e-18, 1.12406575e-20 ,1.61654594e-18, 4.05129215e-16,
 4.42485638e-19 ,2.47655932e-18])
g=np.array([6.6305534e-17,1.0000000e+00 ,1.1786218e-13 ,2.3428323e-22, 5.9469584e-13,
            5.7867524e-19, 5.0707482e-17, 4.4700832e-13, 1.3639888e-15, 1.2760710e-15])


div_measure= lambda a,b :
print(sum(f), sum(g))
print(np.multiply(f,np.log(np.divide(f,g))).sum())

This will output about -2.917560

bkoyuncu · December 8, 2022, 11:53pm

These are not clearly summing up to 1

Wu_Zq · May 27, 2024, 9:08am

To avoid underflow issues when computing this quantity, this loss expects the argument input in the log-space. KLDivLoss — PyTorch 2.3 documentation

here is an example:

import numpy as np
import torch
import torch.nn as nn


# KL divergence
# D(P,Q): P=real prob dist, Q=detected probability distribution
# D(P,Q) = Σ (- P * logQ + P * logP), 对所有 x求和, 其中 P=P(x), Q=Q(x)
# https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html
def test1():
    targets = [0.2, 0.3, 0.5]  # P-true
    inputs = [0.5, 0.3, 0.2]  # P-predict

    loss_fn = nn.KLDivLoss(reduction='sum')
    log_inputs = np.log(inputs)  # note here, in torch, assume inputs already log
    loss = loss_fn(torch.tensor(log_inputs), torch.tensor(targets))
    print(f'loss_1={loss}')

    # manual
    loss_2 = 0
    for idx, px in enumerate(targets):
        qx = inputs[idx]
        loss_2 += px * np.log(px) - px * np.log(qx)
    print(f'loss_2={loss_2}')
    # loss_1 = 0.2748872074901785
    # loss_2 = 0.2748872195622465


if __name__ == '__main__':
    test1()