So I have the following code snippet:
import torch loc = torch.tensor(0., requires_grad=True) scale = torch.tensor(1., requires_grad=True) gaussian_test = torch.distributions.Normal(loc, scale) gaussian_y.log_prob(torch.tensor(0.)).backward() print(loc.grad, scale.grad) # None, None
Since calculating a logPDF at a particular point consists of exclusively differentiable operations, then I would expect to be able to get a gradient of a
log_prob operation with respect to the distribution parameters. However this is not the case. Why would that happen?