Suppose I define a tunable tensor. In order to tune it in CUDA, I have 3 steps:
- move it to CUDA
- add
.requires_grad_()
to it - init an optimizer for it
Is my order right? Can I switch the step order?
Suppose I define a tunable tensor. In order to tune it in CUDA, I have 3 steps:
.requires_grad_()
to itIs my order right? Can I switch the step order?
Hi @Yoo,
I think you can swap step 1 and 2 around, but you can’t swap step 3 as if you had the parameters on the cpu and initialized them in an optimizer, then moved them to CUDA
then your update (.grad
attribute) would be on the GPU but the parameters would be on the cpu.
Also, sharing a minimal reproducible example will help illustrate your problem more easily!
I tried three orders:
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
1
a0 = torch.FloatTensor([1]).clone().detach().requires_grad_(True) # require_grad
print(“leaf”, a0.is_leaf)
a0 = a0.to(device) # move to CUDA
print(“leaf”, a0.is_leaf)
opt = torch.optim.SGD([a0], lr = 0.01) # init opt2
a0 = torch.FloatTensor([1]).clone().detach()
print(“leaf”, a0.is_leaf)
a0 = a0.to(device) # move to CUDA
a0.requires_grad_ = True # require_grad
print(“leaf”, a0.is_leaf)
opt = torch.optim.SGD([a0], lr = 0.01) # init opt3
a0 = torch.FloatTensor([1]).clone().detach().requires_grad_(True) # require_grad
print(“leaf”, a0.is_leaf)
opt = torch.optim.SGD([a0], lr = 0.01) # init opt
a0 = a0.to(device) # move to CUDA
print(“leaf”, a0.is_leaf)
Case 1 gave error since .to(device) made the tensor differentiable.
Case 2 and 3 worked but I was not sure whether the optimizer in 3 knew to update a0 in device instead of in CPU?