Send parameters to CUDA before defining optimizer

Is it possible to send tunable parameter to CUDA before defining the optimizer for it?

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
param = torch.FloatTensor([1]).clone().detach().requires_grad_(True)
param = param.to(device)
opt = torch.optim.SGD([param], lr = 0.01)

I got
ValueError: can’t optimize a non-leaf Tensor

The to() operation is differentiable and thus creates a non-leaf tensor. Move the data to the device first and create a leaf tensor afterwards via .requires_grad_().

Suppose I define a tensor, in order to tune it in CUDA, I have 3 steps:

  1. move it to CUDA
  2. add .requires_grad_() to it

When is the proper time to define my optimizer for it? Between 1 and 2? Does the order matter?