However, the gradient with respect to mean_linear (weight and bias) is 0; all other gradients seem correct (linear1, linear2, log_std_linear). I have absolutely no idea why this is the case. When checking the source code of log_prob`` I can see it calls both, self.loc(the mean) andself.scale(the std), therefore the gradient should propagate, but somehow the output is always0```. Why does this happen? and how to fix it?

When you call .log_prob (x_t) using the same Normal distribution from
which you .rsample()ed x_t, the mean drops out, so you are correctly
getting zero for your gradient.

What’s going on is that when you use “reparameterization sampling,” the Distribution generates some random variate (I assume a uniform variate,
but I don’t really know.) and transforms it into a variate from your desired
distribution. This transformation is differentiable with respect to the
parameters that describe your distribution, so you can backpropagate
through the rsample()ed value.

In the case of Normal, that random variate tells you how improbable
your sample should be – how many standard deviations it should be
from the mean of your Normal distribution. log_prob() then tells you
how improbable your sample is – that is, how many standard deviations
it is from that same mean. So log_prob() is independent of mean, and
your gradient is zero.

Thanks for the insightful answer. This gave me another question, in case you know. When using RL one differentiates the log of the policy with respect to the parameters, e.g., the mean and the variance. Arguably, having the correct mean is more important than having the correct variance, e.g., some RL algorithms assume a variance and learn the mean.

How can one go about getting the gradient of the mean? Does it have to be done manually?