Exact meaning of grad_input and grad_output

HjalmarLucius · March 27, 2019, 11:16pm

Further to this - would it not make sense to also have the forward IO parameters in the backward hook? They should already be in memory.

To debug e.g. errors / spikes in gradients one often wants all access to all variables (fwd IO, bwd IO and Parameter grads) the bwd operation used.