How to compute KL for other distributions

Hi,

In case KL-divergence is not implemented in torch.distributions, what should be the best way to implement it in order to avoid numerical instabilities ? (i.e. KL < 0) ? Following the definition: p * (log_p - log_q) ?

Thanks