"If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used."
We tried to test this using the code below, but for us it shows that pre and post weights are different, so we believe optimizer is still working, even if .cuda
is called after creating optimizer object. So what does this note affect?
import torch
import numpy as np
import torch.nn as nn
import torch.optim as optim
net = nn.Linear(1, 2)
net1 = nn.Linear(1, 2)
net1.load_state_dict(net.state_dict())
pre, pre1 = net.weight.clone(), net1.weight.clone()
net.cuda()
optimizer = optim.SGD(net.parameters(), lr=10)
optimizer_1 = optim.SGD(net1.parameters(), lr=10)
net1.cuda()
inp = torch.autograd.Variable(torch.randn(1, 1)).cuda()
out = torch.autograd.Variable(torch.randn(1, 2)).cuda()
loss = torch.nn.functional.mse_loss(net(inp), out)
loss1 = torch.nn.functional.mse_loss(net1(inp), out)
optimizer.zero_grad()
loss.backward()
optimizer.step()
optimizer_1.zero_grad()
loss1.backward()
optimizer_1.step()
post, post1 = net.weight.clone(), net1.weight.clone()
print pre, pre1
print post, post1