Loss class implementation for KLDivLoss

lijunzh · October 26, 2020, 4:33pm

I am testing the KLDivLoss implementation here: https://github.com/liuzechun/ReActNet/blob/465f9ba458b3937915e5e5613a85b74123d9ff00/utils/KD_loss.py#L8

It can be simplified as

class DistributionLoss(_Loss):
    def forward(self, model_output, real_output):

        model_output_log_prob = F.log_softmax(model_output, dim=1)
        real_output_soft = F.softmax(real_output, dim=1)
        del model_output, real_output

        real_output_soft = real_output_soft.unsqueeze(1)
        model_output_log_prob = model_output_log_prob.unsqueeze(2)

        cross_entropy_loss = -torch.bmm(real_output_soft, model_output_log_prob)
        cross_entropy_loss = cross_entropy_loss.mean()

        return cross_entropy_loss

If we define KLDivLoss as KL_loss = - \sum P(x) * log (Q(x) / P(x)) for distribution P and Q. Then, I couldn’t find anywhere in the code above has the division between Q and P (or substraction between log(Q) and log( P)).

The following iPython history confirms what I suspected. However, when putting into the inherent class of _Loss, it just magically worked out. I couldn’t understand why. Please help.

Line [60] is running the code outside the class => wrong answer.
Line [61] is running the code inside the class => correct answer.
Line [62] is running the “correct” code outside the class => correct answer.

In [60]: -torch.bmm(F.softmax(outputs_teacher, dim=1).unsqueeze(1), F.log_softmax(outputs, dim=1).unsqueeze(2)).mean()
Out[60]: tensor(11.1128, grad_fn=<NegBackward>)

In [61]: DistributionLoss()(outputs, outputs_teacher)
Out[61]: tensor(5.4601, grad_fn=<MeanBackward0>)

In [62]: -torch.bmm(F.softmax(outputs_teacher, dim=1).unsqueeze(1), (F.log_softmax(outputs, dim=1) - F.log_softmax(outputs_teacher, dim=1)).unsqueeze(2)).mean()
Out[62]: tensor(5.4601, grad_fn=<NegBackward>)