In my project, I used nn.KLDiv(A.log(), B) to calculate the KL divergence between A and B.

But now I have to rewrite this part as numpy or other general python libraries in order to some model conversion.

I did a lot of search on google, and tried the following implementations:

```
(1) First method
def KL(a, b):
a = np.asarray(a, dtype=np.float)
b = np.asarray(b, dtype=np.float)
return np.sum(np.where(a != 0, a * np.log(a / b), 0))
(2) Second method
scipy.special.kldiv
Neither of them show the similar output with the nn.KLDivLoss method in pytorch.
```

Could someone provide some guidance how to implement KL divergence with numpy to provide similar output with pytorch? Thanks.