Question of usage of kl_divergence

Hi Zhen Qiang,

Thanks for your reply. But still confused.

When P is the true distribution, and Q is the predicted distribution,
may I say,
it should be F.kl_div(Q, P), because Q is input, and P is target.
But usually, to avoid the underflow issue, F.kl_div(P.log(), Q) is often used.

Is my understanding correct?

I