WERush
(Xinge)
April 22, 2018, 8:43am
1
When I use the nn.KLDivLoss(), the KL gives the negative values. For example,
a1 = Variable(torch.FloatTensor([0.1,0.2]))
a2 = Variable(torch.FloatTensor([0.3, 0.6]))
a3 = Variable(torch.FloatTensor([0.3, 0.6]))
a4 = Variable(torch.FloatTensor([-0.3, -0.6]))
a5 = Variable(torch.FloatTensor([-0.3, -0.6]))
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4088
c2 = nn.KLDivLoss()(a2,a3) #==> -0.5588
c3 = nn.KLDivLoss()(a4,a5) #==> 0
c4 = nn.KLDivLoss()(a3,a4) #==> 0
c5 = nn.KLDivLoss()(a1,a4) #==> 0
In theory, the KLDloss should not be negative. Can anyone explain that?
Thanks.
your variables a1, a2, a3, a4, a5 are not proper distributions, i.e. they don’t sum up to 1.
WERush
(Xinge)
April 22, 2018, 11:55am
3
a2 = Variable(torch.FloatTensor([0.3, 0.6, 0.1]))
a1 = Variable(torch.FloatTensor([0.4, 0.5, 0.1]))
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426
tom
(Thomas V)
April 22, 2018, 11:59am
4
They still aren’t distributions.
Keep in mind that the loss functions take batches. So you’d want to unsqueeze(0).
Best regards
Thomas
WERush
(Xinge)
April 22, 2018, 12:02pm
5
a1.unsqueeze_(0)
a2.unsqueeze_(0)
c1 = nn.KLDivLoss()(a1,a2) #==> -0.4426
tom
(Thomas V)
April 22, 2018, 1:07pm
6
…and x (but not y) should be logprobs, say the doc. My apologies for missing that at first.
Best regards
Thomas
Edit : so back at the (pytorch 0.4) computer
import torch
a2 = torch.FloatTensor([0.3, 0.6, 0.1])
a1 = torch.FloatTensor([0.4, 0.5, 0.1])
c1 = torch.nn.KLDivLoss(size_average=False)(a1.log(),a2)
print(c1)
gives 2.3088
, the same as explicitly computing
(a2*(a2.log()-a1.log())).sum()
Note that PyTorch erroneously averages the terms by default rather than summing them.
16 Likes
WERush
(Xinge)
April 23, 2018, 2:42am
8
OK, I get it. Thanks very much.
Pei_Wang
(wynonna)
October 22, 2019, 3:55am
9
I still get negative output
Same. I created a post over here since my negative KL divergence values might be caused by the output of torch.nn.functional.softmax producing vectors that don’t sum to 1.
I started receiving negative KL divergences between a target Dirichlet distribution and my model’s output Dirichlet distribution. Someone online suggested that this might be indicative that the parameters of the Dirichlet distribution don’t sum to 1. I thought this was ridiculous since the output of the model is passed through
output = F.softmax(self.weights(x), dim=1)
But after looking into it more closely, I found that torch.all(torch.sum(output, dim=1) == 1.) returns False! Looking at the …
MFajcik1
(Martin FajÄŤĂk)
January 20, 2020, 10:02am
11
Did you normalized values with log_softmax?
torch.nn.KLDivLoss(size_average=False)(F.log_softmax(scores, -1), targets)
1 Like
Hi Tom. May I ask why those values are not stll distributions ? Thanks
tom
(Thomas V)
May 21, 2020, 11:42am
13
Because they’re a batch of three things with values 0.3, 0.6, 0.1 individually for a2 rather than a single batch of three values.
Also, as noted in the post below (which has the correct use except for not using torch.tensor
as one would now), a1 should be passed as log probabilities.
Best regards
Thomas
Lin_Jia
(Lin Jia)
October 8, 2020, 12:11am
14
I was calculating KL Divergence loss, and it is negative, that leads me to here. Thanks for all your previous answers.
This is the mathematical proof of why KLDLoss should be above zero:
The cornerstone of the proof is that for KLDLoss(p, q), sum(q) needs to equal one to make sure the loss is above zero. So even if you have p = log_softmax(tensor), you might still get negative values if your target is not a true distribution: sum(q) != 1
2 Likes
I am now using pytorch 1.6. The result is 0.0231 instead of 2.3088.
tom
(Thomas V)
January 5, 2021, 7:47pm
16
Yeah, I think I might have missed a e-2 in my post.
ube
(Mubarek Mohammed)
November 9, 2022, 12:24pm
17
f=np.array([5.96157837e-23, 1.00000000e+00, 1.22259395e-20, 1.33735148e-23,
4.61867503e-18, 1.12406575e-20 ,1.61654594e-18, 4.05129215e-16,
4.42485638e-19 ,2.47655932e-18])
g=np.array([6.6305534e-17,1.0000000e+00 ,1.1786218e-13 ,2.3428323e-22, 5.9469584e-13,
5.7867524e-19, 5.0707482e-17, 4.4700832e-13, 1.3639888e-15, 1.2760710e-15])
div_measure= lambda a,b :
print(sum(f), sum(g))
print(np.multiply(f,np.log(np.divide(f,g))).sum())
This will output about -2.917560
bkoyuncu
(bk)
December 8, 2022, 11:53pm
18
These are not clearly summing up to 1
Wu_Zq
(Wu Zq)
May 27, 2024, 9:08am
19
To avoid underflow issues when computing this quantity, this loss expects the argument input
in the log-space. KLDivLoss — PyTorch 2.3 documentation
here is an example:
import numpy as np
import torch
import torch.nn as nn
# KL divergence
# D(P,Q): P=real prob dist, Q=detected probability distribution
# D(P,Q) = ÎŁ (- P * logQ + P * logP), 对所有 x求和, ĺ…¶ä¸ P=P(x), Q=Q(x)
# https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html
def test1():
targets = [0.2, 0.3, 0.5] # P-true
inputs = [0.5, 0.3, 0.2] # P-predict
loss_fn = nn.KLDivLoss(reduction='sum')
log_inputs = np.log(inputs) # note here, in torch, assume inputs already log
loss = loss_fn(torch.tensor(log_inputs), torch.tensor(targets))
print(f'loss_1={loss}')
# manual
loss_2 = 0
for idx, px in enumerate(targets):
qx = inputs[idx]
loss_2 += px * np.log(px) - px * np.log(qx)
print(f'loss_2={loss_2}')
# loss_1 = 0.2748872074901785
# loss_2 = 0.2748872195622465
if __name__ == '__main__':
test1()