Trouble understanding _version semantics

Nifrec · March 20, 2021, 3:57pm

Autograd does not allow backpropagation through tensors with their _version attribute greater than 0. However, when training a linear model, the _version of the learnable parameters do change each time the step() function of the optimizer is called, and this normally does not produce any errors. Are exceptions made for nn.Parameter instances? Or for leafs in the computational graph? Or is the mechanism even something else?

Take the following example:

class MyModule(nn.Module):

    def __init__(self):
        super().__init__()
        self.my_param = nn.Parameter(torch.tensor([1.0], requires_grad=True))

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return x * self.my_param


mm = MyModule()
optim = torch.optim.Adam(mm.parameters())

for epoch in range(2):
    optim.zero_grad()
    loss = mm(torch.tensor([2.0]))
    loss.backward()
    optim.step()
print(mm.my_param._version)
print(mm(torch.tensor([2.0]))._version)
print(loss._version)

This gives the following output:

2
0
0

Clearly the _version attribute of my_param does not stay 0.

The context in which I run into problems is as follows: I am more or less simulating particles using Newtonian mechanics. I want to train the initial positions, initial velocities, initial accelerations and the masses of the particles such that they end up at a specific position. Currently it works for one epoch, but the second epoch I get errors that some _version is not 0, and torch.autograd.set_detect_anomaly(True) did unfortunately not produce useful help. But it is unclear to me which variables are allowed to change in-place and which not. If only nn.Parameters are allowed to change in-place, then I can start hunting for a variable that is neither a parameter nor reset at the start of the second epoch. But currently I do not know when in-place operations are allowed, so I find it hard to tell what may go wrong.