Let’s say we have:
dL/dw1 = dL/dy * dy/dw1
dL/dv1 = dL/dy * dy/dw1 * dw1/dv1
w1 and v1 are both parameters that will be returned in module.named_parameters().
Let’s say we have a deep network with many nested parameters.
If I manually modify dL/dy (say, set dL/dy = dL/dy * 10000) how do I propagate this manual change to all applicable parameter gradients that will be affected under chain rule?