Hi Zhen Qiang,
Thanks for your reply. But still confused.
When P is the true distribution, and Q is the predicted distribution,
may I say,
it should be F.kl_div(Q, P), because Q is input, and P is target.
But usually, to avoid the underflow issue, F.kl_div(P.log(), Q) is often used.
Is my understanding correct?
I