Autograd does not allow backpropagation through tensors with their _version
attribute greater than 0. However, when training a linear model, the _version
of the learnable parameters do change each time the step()
function of the optimizer is called, and this normally does not produce any errors. Are exceptions made for nn.Parameter
instances? Or for leafs in the computational graph? Or is the mechanism even something else?
Take the following example:
class MyModule(nn.Module):
def __init__(self):
super().__init__()
self.my_param = nn.Parameter(torch.tensor([1.0], requires_grad=True))
def forward(self, x: torch.Tensor) -> torch.Tensor:
return x * self.my_param
mm = MyModule()
optim = torch.optim.Adam(mm.parameters())
for epoch in range(2):
optim.zero_grad()
loss = mm(torch.tensor([2.0]))
loss.backward()
optim.step()
print(mm.my_param._version)
print(mm(torch.tensor([2.0]))._version)
print(loss._version)
This gives the following output:
2
0
0
Clearly the _version
attribute of my_param
does not stay 0.
The context in which I run into problems is as follows: I am more or less simulating particles using Newtonian mechanics. I want to train the initial positions, initial velocities, initial accelerations and the masses of the particles such that they end up at a specific position. Currently it works for one epoch, but the second epoch I get errors that some _version
is not 0, and torch.autograd.set_detect_anomaly(True)
did unfortunately not produce useful help. But it is unclear to me which variables are allowed to change in-place and which not. If only nn.Parameter
s are allowed to change in-place, then I can start hunting for a variable that is neither a parameter nor reset at the start of the second epoch. But currently I do not know when in-place operations are allowed, so I find it hard to tell what may go wrong.