How to get KL-Divergence?

oasjd7 · June 23, 2020, 4:35pm

import torch.nn.functional as F
import torch.nn as nn

class Network(nn.Module):
    def __init__(self, channel, classes=10):
        super(Network, self).__init__()
        self.fc = nn.Linear(channel, classes)
    def forward(self, x):
        x = self.fc(x)
        return x

net = Network()
out = net(input)
# out.shape == [16,10] 

Q = out
P = out
print((P * (P / Q).log()).sum()) # return 0
print(F.kl_div(Q, P, None, None, 'sum')) # return negative value.

I heard
(P * (P / Q).log()).sum()
is equal to
F.kl_div(Q.log(), P, None, None, 'sum').

With above my test code, (P * (P / Q).log()).sum() returns 0 when Q and P are the same, but F.kl_div returns negative values.

What`s the correct way to get KL-Divergence ?

ptrblck · June 24, 2020, 5:55am

Based on the docs, the first argument should contain the log probabilities, while the second one the probabilities:

As with NLLLoss , the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).

while Q and P seem to contain logits in your use case.