Detach and .data

tom · June 11, 2018, 8:42am

Hi,

for p in model.parameters():
    p = (p - lr * p.grad).detach()

No! Don’t do that! Apologies for the bold font. If you assign p, you will just overwrite the name and the model parameter will be unchanged.

Now, if you look at the source code of an optimizer, say SGD, you find that the update rule is still more or less what your original post uses:

for p in model.parameters():
    p.data.add_(-lr, p.grad.data)

so that will work because everyone still uses it.
That said, I personally think that using torch.no_grad()

with torch.no_grad()
    for p in model.parameters():
        p.add_(-lr, p.grad)

is a better way to achieve the same.
But take this last advice with a grain of salt: I’m biased because I’m trying to convince people that there should be an inplace hook for some caching style applications.

As a general rule, I try to consult the PyTorch source code as an example as much as I can, and so far, I seem to be doing OK with that strategy.

Best regards

Thomas