Tensor.cuda() vs variable.cuda()

Hello, Would anyone happen to have any ideas why this is passing when tensor is converted to cuda before wrapped in a variable but not the other way around?

from torch.autograd import gradcheck
from torch.autograd import Variable
import torch

inputs1 = (
Variable(torch.randn(3, 1, 2).float().cuda(), requires_grad=True),
Variable(torch.randn(3, 2, 1).float().cuda(), requires_grad=True),)

test1 = gradcheck(torch.bmm, inputs1, eps=1e-3, atol=1e-3)

inputs2 = (
Variable(torch.randn(3, 1, 2).float(), requires_grad=True).cuda(),
Variable(torch.randn(3, 2, 1).float(), requires_grad=True).cuda(),)

test2 = gradcheck(torch.bmm, inputs2, eps=1e-3, atol=1e-3, raise_exception=False)


See Variable grad is always None when extending autograd

.cuda() creates another Variable that isn’t a leaf node in the computation graph. Since you’re using it as an input it doesn’t accumulate gradients.