Hi,
In case KL-divergence is not implemented in torch.distributions, what should be the best way to implement it in order to avoid numerical instabilities ? (i.e. KL < 0) ? Following the definition: p * (log_p - log_q)
?
Thanks
Hi,
In case KL-divergence is not implemented in torch.distributions, what should be the best way to implement it in order to avoid numerical instabilities ? (i.e. KL < 0) ? Following the definition: p * (log_p - log_q)
?
Thanks