How to retain dependency between variables?

I am modeling k-dimensional positions over time t = 0…T using a set of initial positions Z0 with requires_grad=True and storing the results in Z with requires_grad=False for the remaining T-1 time steps.

A simple model is Zt = Zt-1 + e where e is some constant noise. Which is optimized in PyTorch using gradient descent, by moving the initial positions accordingly.

The problem is, when using Z to compute subsequent time steps for t > 1, the relation between Zt and Z0 is lost, such that the model converges significantly slower opposed to simply modeling Zt = Z0 + t * e, where the dependency between initial positions and Zt is retained.

Note: This model is for illustrative purposes only, such that the models in question are too complex to be defined in terms of Z0, requiring the intermediary results of Z.

Accumulating gradients or retaining gradient graph does not help.

Does it means detach Z_t (t>=1) from the graph? If just setting Z.requires_grad_(False), it will raise an error, because it is an intermediate node which used for calculate gradient for Z_0.

I’ve tried using Z.data.copy_ to store the intermediate results, in which case Zt is detached from the graph and an error is not raised.

Alternatively, I’ve used a hidden state to store the intermediate nodes and used retain_graph=True for the remaining T-1 time steps, unfortunately, the convergence is not even remotely similar to when the relation is modeled directly.

1 Like