[Solved] Self-implemented KL-Divergence accuracy doesn't match the built-in one

Hello, everyone,

To figure out how to implement a new loss function, I first implemented a KL-Divergence loss, the code is as follows (the difference with the built-in one is that this one contains the log_softmax layer.

class KLDLoss(nn.Module):

    def __init__(self, size_average=True):
        super(KLDLoss, self).__init__()
        self.size_average = size_average

    def forward(self, inputs, targets):
        N = inputs.size(0)
        C = inputs.size(1)
        logP = F.log_softmax(inputs)

        class_mask = inputs.data.new(N, C).fill_(0)
        class_mask = Variable(class_mask)
        targets[targets==0] = 2 # Avoid log(0)

        probs = (logP.exp()*class_mask).sum(1).view(-1,1)     
        batch_loss = (targets * (targets.log() - logP) * class_mask).sum(1)

        if self.size_average:
            loss = batch_loss.mean()
            loss = batch_loss.sum()
        return loss

However, while the built-in kld loss can achieve 87% accuracy, this one can only achieve 78%. I wonder if there is something wrong in my implementation, or I should implement the backpropagation part (without using the autograd).


Solved, size_average should also be performed across the output dimension.


Hi Wang, thanks for sharing, but what do you mean by saying that ‘size_average should also be performed across the output dimension.’? Also, could you please provide the correct version of KL divergence loss code? Thanks a lot.