In another thread it has been pointed out that `grad_input`

is gradients w.r.t. to the inputs of the last operation of the layer, while `grad_output`

is the gradient w.r.t the output of the layer. So which of these will be used in the next step of the chain rule (the gradient of the layer preceding this one)?

Does `grad_input`

contain gradients w.r.t the parameters only, and thus is not used in further computations, or is it passed on for the next computation. Or is `grad_output`

used for the next computation?

Is there a tutorial out there which talks about how `grad_input`

and `grad_output`

are used in the computation (particular to pytorch, not chain rule in general)