How to save gradients for next iteration step

Hello, I’m trying to write a custom autograd function where I sparsify the gradients but keep the non-sparsified portion for the next iteration step as an error compensation just like in the paper “Sparsified SGD with Memory”.

I know I can use ctx to store information from the forward propagation to backpropagation, but what I need is to store it from the current to the next backpropagation instance. Is there a way to do so?

I previously tried to use hooks in order to rewrite the backprogation behavior and store inside the class these gradients, but it is unstable for Conv2d when using CUDA.