Note comment in torch.optim package

gsp-27 · August 20, 2018, 1:21am

"If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.

In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used."

We tried to test this using the code below, but for us it shows that pre and post weights are different, so we believe optimizer is still working, even if .cuda is called after creating optimizer object. So what does this note affect?

import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim

net = nn.Linear(1, 2)
net1 = nn.Linear(1, 2)
net1.load_state_dict(net.state_dict())
pre, pre1 = net.weight.clone(), net1.weight.clone()
net.cuda()
optimizer = optim.SGD(net.parameters(), lr=10)
optimizer_1 = optim.SGD(net1.parameters(), lr=10)
net1.cuda()
inp = torch.autograd.Variable(torch.randn(1, 1)).cuda()
out = torch.autograd.Variable(torch.randn(1, 2)).cuda()

loss = torch.nn.functional.mse_loss(net(inp), out)
loss1 = torch.nn.functional.mse_loss(net1(inp), out)

optimizer.zero_grad()
loss.backward()
optimizer.step()
optimizer_1.zero_grad()
loss1.backward()
optimizer_1.step()

post, post1 = net.weight.clone(), net1.weight.clone()
print pre, pre1
print post, post1

SimonW · August 20, 2018, 2:18am

It doesn’t affect all the optimizer.

gsp-27 · August 20, 2018, 2:28am

would mentioning which optimizers does it affect will be helpful in docs? If so, I can check which optimizers are being affected and submit a pull request to modify the docs?

SimonW · August 20, 2018, 6:05am

Now there is only 1 or 2 being affected IIRC, and those can be worked around. I’d happy to accept a PR that “fixes” those optimizers and remove that note from doc.

gsp-27 · August 20, 2018, 3:13pm

Cool, I will try to work on that, do you have any pointers for starting.

Thanks