What is behind pathwise derivative?

Hi everyone,

I couldn’t find how Pytorch deals with the reparameterization trick using the rsample method (Probability distributions - torch.distributions — PyTorch 1.9.0 documentation).

For distributions that do not belong to the location-scale family like the Gamma distribution, does it use the implicit reparametrization gradients (https://arxiv.org/pdf/1805.08498.pdf) or other approximations?