Update internal variable across different forward pass

This looks like some kind of backpropgation through time. Could you consider doing a single backward step for multiple forward steps? Correct way to do backpropagation through time? - PyTorch Forums

I guess another question is what it would mean to compute a gradient for that variable across multiple forward passes. It’s not really a parameter as it is being overwritten during every forward pass.