Modified Bessel Function of Order 0

Thank you @tom I have finally computed the KL term separately with scipy and then moved it to GPU, the gradient is not with respect to kappa so I think there shouldn’t be any problem. This is the KL-divergence term I would like to compute:

d is the dimesntion of the latent space.