In this file, how could we understand the
backward for Normal distribution.
To make it simple, let’s say 1D, given a mean and std, we have a sample = mean + std*eps, where eps ~ N(0, 1).
backward, the grad_mean = -reward*(sample - mean)/std**2.
It is not very clear to me why it is like that. Since if we have a sample, according to the formula, d_sample/d_mean = 1. So, grad_mean = upward_gradient * d_sample/d_mean