How to understand the backward()
in stochastic functions ?
e.g. For Normal distribution, grad_mean = -(output - mean)/std**2
, however why it is following this formula ? Is it a derivative of Gaussian PDF ? The forward pass only use output = mean + std*eps
where eps ~ N(0, 1), so the gradient w.r.t. mean should be identity ?