Further to this - would it not make sense to also have the forward IO parameters in the backward hook? They should already be in memory.
- To debug e.g. errors / spikes in gradients one often wants all access to all variables (fwd IO, bwd IO and Parameter grads) the bwd operation used.