WERush  
                (Xinge)
               
                 
              
                  
                    April 22, 2018,  8:43am
                   
                   
              1 
               
             
            
              When I use the nn.KLDivLoss(), the KL gives the negative values. For example,
a1 = Variable(torch.FloatTensor([0.1,0.2]))
a2 = Variable(torch.FloatTensor([0.3, 0.6]))
a3 = Variable(torch.FloatTensor([0.3, 0.6]))
a4 = Variable(torch.FloatTensor([-0.3, -0.6]))
a5 = Variable(torch.FloatTensor([-0.3, -0.6]))
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4088
c2 = nn.KLDivLoss()(a2,a3) #==> -0.5588
c3 = nn.KLDivLoss()(a4,a5) #==> 0
c4 = nn.KLDivLoss()(a3,a4) #==> 0
c5 = nn.KLDivLoss()(a1,a4) #==> 0
 
In theory, the KLDloss should not be negative. Can anyone explain that?
Thanks.
             
            
               
               
               
            
            
           
          
            
            
              your variables a1, a2, a3, a4, a5 are not proper distributions, i.e. they don’t sum up to 1.
             
            
               
               
               
            
            
           
          
            
              
                WERush  
                (Xinge)
               
              
                  
                    April 22, 2018, 11:55am
                   
                   
              3 
               
             
            
              a2 = Variable(torch.FloatTensor([0.3, 0.6, 0.1]))
a1 = Variable(torch.FloatTensor([0.4, 0.5, 0.1]))
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426
 
             
            
               
               
               
            
            
           
          
            
              
                tom  
                (Thomas V)
               
              
                  
                    April 22, 2018, 11:59am
                   
                   
              4 
               
             
            
              They still aren’t distributions.  
Keep in mind that the loss functions take batches. So you’d want to unsqueeze(0).
Best regards
Thomas
             
            
               
               
               
            
            
           
          
            
              
                WERush  
                (Xinge)
               
              
                  
                    April 22, 2018, 12:02pm
                   
                   
              5 
               
             
            
              a1.unsqueeze_(0)
a2.unsqueeze_(0)
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426 
 
             
            
               
               
               
            
            
           
          
            
              
                tom  
                (Thomas V)
               
              
                  
                    April 22, 2018,  1:07pm
                   
                   
              6 
               
             
            
              …and x (but not y) should be logprobs, say the doc. My apologies for missing that at first.
Best regards
Thomas
Edit : so back at the (pytorch 0.4) computer
import torch
a2 = torch.FloatTensor([0.3, 0.6, 0.1])
a1 = torch.FloatTensor([0.4, 0.5, 0.1])
c1 = torch.nn.KLDivLoss(size_average=False)(a1.log(),a2)
print(c1)
 
gives 2.3088, the same as explicitly computing
(a2*(a2.log()-a1.log())).sum()
 
Note that PyTorch erroneously averages  the terms by default rather than summing them.
             
            
               
               
              16 Likes 
            
            
           
          
            
              
                WERush  
                (Xinge)
               
              
                  
                    April 23, 2018,  2:42am
                   
                   
              8 
               
             
            
              OK, I get it. Thanks very much.
             
            
               
               
               
            
            
           
          
            
              
                Pei_Wang  
                (wynonna)
               
              
                  
                    October 22, 2019,  3:55am
                   
                   
              9 
               
             
            
              I still get negative output
             
            
               
               
               
            
            
           
          
            
            
              Same. I created a post over here since my negative KL divergence values might be caused by the output of torch.nn.functional.softmax producing vectors that don’t sum to 1.
  
  
    I started receiving negative KL divergences between a target Dirichlet distribution and my model’s output Dirichlet distribution. Someone online suggested that this might be indicative that the parameters of the Dirichlet distribution don’t sum to 1. I thought this was ridiculous since the output of the model is passed through 
output = F.softmax(self.weights(x), dim=1) 
But after looking into it more closely, I found that  torch.all(torch.sum(output, dim=1) == 1.) returns False! Looking at the …
   
 
             
            
               
               
               
            
            
           
          
            
              
                MFajcik1  
                (Martin FajÄŤĂk)
               
              
                  
                    January 20, 2020, 10:02am
                   
                   
              11 
               
             
            
              Did you normalized values with log_softmax?
torch.nn.KLDivLoss(size_average=False)(F.log_softmax(scores, -1), targets)
 
             
            
               
               
              1 Like 
            
                
            
           
          
            
            
              Hi Tom. May I ask why those values are not stll distributions ? Thanks
             
            
               
               
               
            
            
           
          
            
              
                tom  
                (Thomas V)
               
              
                  
                    May 21, 2020, 11:42am
                   
                   
              13 
               
             
            
              Because they’re a batch of three things with values  0.3, 0.6, 0.1 individually for a2 rather than a single batch of three values. 
Also, as noted in the post below (which has the correct use except for not using torch.tensor as one would now), a1 should be passed as log probabilities.
Best regards
Thomas
             
            
               
               
               
            
            
           
          
            
              
                Lin_Jia  
                (Lin Jia)
               
              
                  
                    October 8, 2020, 12:11am
                   
                   
              14 
               
             
            
              I was calculating KL Divergence loss, and it is negative, that leads me to here. Thanks for all your previous answers.
This is the mathematical proof of why KLDLoss should be above zero: 
The cornerstone of the proof is that for KLDLoss(p, q), sum(q) needs to equal one to make sure the loss is above zero. So even if you have p = log_softmax(tensor), you might still get negative values if your target is not a true distribution: sum(q) != 1
             
            
               
               
              2 Likes 
            
                
            
           
          
            
            
              I am now using pytorch 1.6. The result is 0.0231 instead of 2.3088.
             
            
               
               
               
            
            
           
          
            
              
                tom  
                (Thomas V)
               
              
                  
                    January 5, 2021,  7:47pm
                   
                   
              16 
               
             
            
              Yeah, I think I might have missed a e-2 in my post.
             
            
               
               
               
            
            
           
          
            
              
                ube  
                (Mubarek Mohammed)
               
              
                  
                    November 9, 2022, 12:24pm
                   
                   
              17 
               
             
            
              f=np.array([5.96157837e-23, 1.00000000e+00, 1.22259395e-20, 1.33735148e-23,
 4.61867503e-18, 1.12406575e-20 ,1.61654594e-18, 4.05129215e-16,
 4.42485638e-19 ,2.47655932e-18])
g=np.array([6.6305534e-17,1.0000000e+00 ,1.1786218e-13 ,2.3428323e-22, 5.9469584e-13,
            5.7867524e-19, 5.0707482e-17, 4.4700832e-13, 1.3639888e-15, 1.2760710e-15])
div_measure= lambda a,b :
print(sum(f), sum(g))
print(np.multiply(f,np.log(np.divide(f,g))).sum())
 
This will output about -2.917560
             
            
               
               
               
            
            
           
          
            
              
                bkoyuncu  
                (bk)
               
              
                  
                    December 8, 2022, 11:53pm
                   
                   
              18 
               
             
            
              These are not clearly summing up to 1
             
            
               
               
               
            
            
           
          
            
              
                Wu_Zq  
                (Wu Zq)
               
              
                  
                    May 27, 2024,  9:08am
                   
                   
              19 
               
             
            
              To avoid underflow issues when computing this quantity, this loss expects the argument input in the log-space. KLDivLoss — PyTorch 2.3 documentation 
here is an example:
import numpy as np
import torch
import torch.nn as nn
# KL divergence
# D(P,Q): P=real prob dist, Q=detected probability distribution
# D(P,Q) = ÎŁ (- P * logQ + P * logP), 对所有 x求和, ĺ…¶ä¸ P=P(x), Q=Q(x)
# https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html
def test1():
    targets = [0.2, 0.3, 0.5]  # P-true
    inputs = [0.5, 0.3, 0.2]  # P-predict
    loss_fn = nn.KLDivLoss(reduction='sum')
    log_inputs = np.log(inputs)  # note here, in torch, assume inputs already log
    loss = loss_fn(torch.tensor(log_inputs), torch.tensor(targets))
    print(f'loss_1={loss}')
    # manual
    loss_2 = 0
    for idx, px in enumerate(targets):
        qx = inputs[idx]
        loss_2 += px * np.log(px) - px * np.log(qx)
    print(f'loss_2={loss_2}')
    # loss_1 = 0.2748872074901785
    # loss_2 = 0.2748872195622465
if __name__ == '__main__':
    test1()