Detach and .data

In previous versions we did something like:

for p in model.parameters():
    p.data.add_(-lr, p.grad.data)

Migration guide says that now using .data is unsafe, so how to rewrite this using .detach()?

Is it correct to do the following:

for p in model.parameters():
    p = (p - lr * p.grad).detach()

Hi,

for p in model.parameters():
    p = (p - lr * p.grad).detach()

No! Don’t do that! Apologies for the bold font. If you assign p, you will just overwrite the name and the model parameter will be unchanged.

Now, if you look at the source code of an optimizer, say SGD, you find that the update rule is still more or less what your original post uses:

for p in model.parameters():
    p.data.add_(-lr, p.grad.data)

so that will work because everyone still uses it.
That said, I personally think that using torch.no_grad()

with torch.no_grad()
    for p in model.parameters():
        p.add_(-lr, p.grad)

is a better way to achieve the same.
But take this last advice with a grain of salt: I’m biased because I’m trying to convince people that there should be an inplace hook for some caching style applications.

As a general rule, I try to consult the PyTorch source code as an example as much as I can, and so far, I seem to be doing OK with that strategy.

Best regards

Thomas

1 Like

No! Don’t do that!** Apologies for the bold font. If you assign p, you will just overwrite the name and the model parameter will be unchanged.

Mau I ask you why is that? Thanks!

Fundamentally that is how Python variables work.

  • A plain assignment = as in p = foo just means "take whatever is on the right side and give it the name on the left side (p). There isn’t anything (__something__) called when
  • This is in stark contrast to the superficially similar p[something] = or x.p = which call p.__setitem__ and x.__setattr__ under the hood.

Best regards

Thomas

OK,that’s really clear,Thanks !