Note comment in torch.optim package

"If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.

In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used."

We tried to test this using the code below, but for us it shows that pre and post weights are different, so we believe optimizer is still working, even if .cuda is called after creating optimizer object. So what does this note affect?

import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim

net = nn.Linear(1, 2)
net1 = nn.Linear(1, 2)
net1.load_state_dict(net.state_dict())
pre, pre1 = net.weight.clone(), net1.weight.clone()
net.cuda()
optimizer = optim.SGD(net.parameters(), lr=10)
optimizer_1 = optim.SGD(net1.parameters(), lr=10)
net1.cuda()
inp = torch.autograd.Variable(torch.randn(1, 1)).cuda()
out = torch.autograd.Variable(torch.randn(1, 2)).cuda()

loss = torch.nn.functional.mse_loss(net(inp), out)
loss1 = torch.nn.functional.mse_loss(net1(inp), out)

optimizer.zero_grad()
loss.backward()
optimizer.step()
optimizer_1.zero_grad()
loss1.backward()
optimizer_1.step()

post, post1 = net.weight.clone(), net1.weight.clone()
print pre, pre1
print post, post1

It doesn’t affect all the optimizer.

would mentioning which optimizers does it affect will be helpful in docs? If so, I can check which optimizers are being affected and submit a pull request to modify the docs?

Now there is only 1 or 2 being affected IIRC, and those can be worked around. I’d happy to accept a PR that “fixes” those optimizers and remove that note from doc. :slight_smile:

Cool, I will try to work on that, do you have any pointers for starting.

Thanks