Modifying Gradients, should I be modifying grad_input, or grad_output?

tumble-weed · January 5, 2019, 5:34am

In another thread it has been pointed out that grad_input is gradients w.r.t. to the inputs of the last operation of the layer, while grad_output is the gradient w.r.t the output of the layer. So which of these will be used in the next step of the chain rule (the gradient of the layer preceding this one)?

Does grad_input contain gradients w.r.t the parameters only, and thus is not used in further computations, or is it passed on for the next computation. Or is grad_output used for the next computation?

Is there a tutorial out there which talks about how grad_input and grad_output are used in the computation (particular to pytorch, not chain rule in general)

richard · January 8, 2019, 4:21pm

Generally you use the grad_output to update the grad_input