Time to init an optimizer

Suppose I define a tunable tensor. In order to tune it in CUDA, I have 3 steps:

  1. move it to CUDA
  2. add .requires_grad_() to it
  3. init an optimizer for it

Is my order right? Can I switch the step order?

Hi @Yoo,

I think you can swap step 1 and 2 around, but you can’t swap step 3 as if you had the parameters on the cpu and initialized them in an optimizer, then moved them to CUDA then your update (.grad attribute) would be on the GPU but the parameters would be on the cpu.

Also, sharing a minimal reproducible example will help illustrate your problem more easily!

I tried three orders:

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

1

a0 = torch.FloatTensor([1]).clone().detach().requires_grad_(True) # require_grad
print(“leaf”, a0.is_leaf)
a0 = a0.to(device) # move to CUDA
print(“leaf”, a0.is_leaf)
opt = torch.optim.SGD([a0], lr = 0.01) # init opt

2

a0 = torch.FloatTensor([1]).clone().detach()
print(“leaf”, a0.is_leaf)
a0 = a0.to(device) # move to CUDA
a0.requires_grad_ = True # require_grad
print(“leaf”, a0.is_leaf)
opt = torch.optim.SGD([a0], lr = 0.01) # init opt

3

a0 = torch.FloatTensor([1]).clone().detach().requires_grad_(True) # require_grad
print(“leaf”, a0.is_leaf)
opt = torch.optim.SGD([a0], lr = 0.01) # init opt
a0 = a0.to(device) # move to CUDA
print(“leaf”, a0.is_leaf)

Case 1 gave error since .to(device) made the tensor differentiable.
Case 2 and 3 worked but I was not sure whether the optimizer in 3 knew to update a0 in device instead of in CPU?