How to understand the
backward() in stochastic functions ?
e.g. For Normal distribution,
grad_mean = -(output - mean)/std**2, however why it is following this formula ? Is it a derivative of Gaussian PDF ? The forward pass only use
output = mean + std*eps where eps ~ N(0, 1), so the gradient w.r.t. mean should be identity ?