I am testing the KLDivLoss implementation here: https://github.com/liuzechun/ReActNet/blob/465f9ba458b3937915e5e5613a85b74123d9ff00/utils/KD_loss.py#L8

It can be simplified as

```
class DistributionLoss(_Loss):
def forward(self, model_output, real_output):
model_output_log_prob = F.log_softmax(model_output, dim=1)
real_output_soft = F.softmax(real_output, dim=1)
del model_output, real_output
real_output_soft = real_output_soft.unsqueeze(1)
model_output_log_prob = model_output_log_prob.unsqueeze(2)
cross_entropy_loss = -torch.bmm(real_output_soft, model_output_log_prob)
cross_entropy_loss = cross_entropy_loss.mean()
return cross_entropy_loss
```

If we define KLDivLoss as `KL_loss = - \sum P(x) * log (Q(x) / P(x))`

for distribution P and Q. Then, I couldn’t find anywhere in the code above has the division between Q and P (or substraction between log(Q) and log( P)).

The following iPython history confirms what I suspected. However, when putting into the inherent class of _Loss, it just magically worked out. I couldn’t understand why. Please help.

Line [60] is running the code outside the class => wrong answer.

Line [61] is running the code inside the class => correct answer.

Line [62] is running the “correct” code outside the class => correct answer.

```
In [60]: -torch.bmm(F.softmax(outputs_teacher, dim=1).unsqueeze(1), F.log_softmax(outputs, dim=1).unsqueeze(2)).mean()
Out[60]: tensor(11.1128, grad_fn=<NegBackward>)
In [61]: DistributionLoss()(outputs, outputs_teacher)
Out[61]: tensor(5.4601, grad_fn=<MeanBackward0>)
In [62]: -torch.bmm(F.softmax(outputs_teacher, dim=1).unsqueeze(1), (F.log_softmax(outputs, dim=1) - F.log_softmax(outputs_teacher, dim=1)).unsqueeze(2)).mean()
Out[62]: tensor(5.4601, grad_fn=<NegBackward>)
```