Optimizing over .cuda() variables

tachim · June 13, 2017, 9:37pm

Hi everyone. Whenever I instantiate a Variable() and call cuda() on it I seem to be unable to create an optimizer that optimizes over it:

x = Variable(torch.FloatTensor(some_np_array), requires_grad=True)
x = x.cuda()
optimizer = optim.SGD([x], lr=1e-2)

throws the exception

ValueError: can't optimize a non-leaf Variable

whereas not calling .cuda() works fine. What is the correct way to optimize over variables that can be processed on the GPU?

albanD · June 14, 2017, 7:39pm

Hi,

The problem is that when you do x = x.cuda(), the new x is not the same as the old one.
If you do x_cuda = x.cuda(), then you can give x to the optimizer.
It is even better to send it to cuda before creating the Variable:

x = Variable(torch.FloatTensor(some_np_array).cuda(), requires_grad=True)

dragen · May 2, 2018, 8:59am

Hi, I met the same problem. Thanks for your tips.
I wonder what’s the difference between x and x_cuda?
Indeed, when our computing is processed in GPU, why dont we optimize x_cuda since it makes more sense？
Thanks.

albanD · May 2, 2018, 11:32am

That is what the last proposition above says: send the tensor on the GPU before making a Variable so that you can optimize this one directly.

dragen · July 26, 2018, 8:43am

@albanD I dnt understand why should give optimizer the x_cpu Instead of x_cuda??
Our computation is conducted on x_cuda, why not pass x_cuda to optimizer?

albanD · July 26, 2018, 9:05am

Hi,

It all depends how you create them. Check this post for a detailed answer.