So I have the following code snippet:
import torch
loc = torch.tensor(0., requires_grad=True)
scale = torch.tensor(1., requires_grad=True)
gaussian_test = torch.distributions.Normal(loc, scale)
gaussian_y.log_prob(torch.tensor(0.)).backward()
print(loc.grad, scale.grad) # None, None
Since calculating a logPDF at a particular point consists of exclusively differentiable operations, then I would expect to be able to get a gradient of a log_prob
operation with respect to the distribution parameters. However this is not the case. Why would that happen?