I was trying to run my code and ran into an error about an in-place operation. I’ve simplified it to the bare minimum so it’s easier to understand. If I’m not wrong, autograd doesn’t like that I modify the tensor y in-place, even if I don’t modify the same elements. Is there a way to allow it? Or is there another way to do what I’m trying to do (use previous slice and parameter u to compute next slice in a differentiable way)?
I hope it’s somewhat clear (it’s my first post).
Thanks
import torch
def dydt(u, y):
dy = u * y
return dy
u = torch.ones(1).requires_grad_()
y = torch.zeros(3, 1)
a = dydt(u, y[0])
y[1] = a + y[0]
b = dydt(u, y[1])
y[2] = b + y[1]
y[2].backward()
print(u.grad)
So to compute the gradient of u, PyTorch needs the intermediate version of y. As the versions (available through the not-official-api-use-at-your-own-risk y._version) are only kept per-tensor and not per-entry, you are seeing that the backward of the product u * y complains about the version of y being different from what it had in the forward pass.
I’d probably work with a list and torch.stack the result at the end if you need the y-tensor. You could also clone y every here or there so that the inplace modifications don’t hit the y tensor used at critical places in the forward.
It is quite operation-specific which inputs and outputs are saved for the backward, but here it will be the multiplication (*), and indeed if you follow the debugging hint to enable anomaly detection, this is what you are pointed at.