Exact meaning of grad_input and grad_output

Further to this - would it not make sense to also have the forward IO parameters in the backward hook? They should already be in memory.

  • To debug e.g. errors / spikes in gradients one often wants all access to all variables (fwd IO, bwd IO and Parameter grads) the bwd operation used.