How to get cuda variable gradient?

why the result between CPU version and GPU version is different? How do we get grad for cuda variable? Thanks.

CPU version:

import torch
from torch.autograd import Variable

l = torch.nn.Linear(6,1)
input = Variable(torch.rand(10,6), requires_grad = True)
out = l(input)
target = Variable(torch.rand(10,1))
crt = torch.nn.L1Loss()
loss = crt(out, target)
loss.backward()
print input.grad

Output:

Variable containing:
1.00000e-02 *
-1.5130 -3.3551 1.2752 1.4854 -0.3192 2.7163
1.5130 3.3551 -1.2752 -1.4854 0.3192 -2.7163
-1.5130 -3.3551 1.2752 1.4854 -0.3192 2.7163
1.5130 3.3551 -1.2752 -1.4854 0.3192 -2.7163
1.5130 3.3551 -1.2752 -1.4854 0.3192 -2.7163
1.5130 3.3551 -1.2752 -1.4854 0.3192 -2.7163
-1.5130 -3.3551 1.2752 1.4854 -0.3192 2.7163
-1.5130 -3.3551 1.2752 1.4854 -0.3192 2.7163
-1.5130 -3.3551 1.2752 1.4854 -0.3192 2.7163
1.5130 3.3551 -1.2752 -1.4854 0.3192 -2.7163
[torch.FloatTensor of size 10x6]

GPU version:

l = torch.nn.Linear(6,1).cuda()
input = Variable(torch.rand(10,6), requires_grad = True).cuda()
out = l(input)
target = Variable(torch.rand(10,1)).cuda()
crt = torch.nn.L1Loss().cuda()
loss = crt(out, target)
loss.backward()
print input.grad

Output: None

2 Likes

The gradient does not work through .cuda(). So instead of

use

input = Variable(torch.rand(10,6).cuda(), requires_grad = True).

Best regards

Thomas

4 Likes

Thank you so much!

-Fei

I just stumbled into the same problem. Can somebody explain the logic behind it? Seems very counter-intuitive.

explained here: Strange behavior of Variable.cuda() and Variable.grad