Difference between two very similar function calls

Hi all,

I’m wondering about the backend difference between the following two constructions:

t = torch.ones([1, 10], requires_grad=True, device='cuda:0')

and

te = torch.ones([1, 10], requires_grad=True)
te = te.cuda(device='cuda:0')

The issue being the following example, relating to https://github.com/pytorch/pytorch/issues/7425:

import torch.nn as nn
import torch

class testModule(nn.Module):
    def __init__(self):
        super(testModule, self).__init__()
        self.lin = nn.Linear(10, 1)

    def forward(self, x):
        return self.lin(x)

def test_cpu():
    t = torch.ones([1, 10], requires_grad=True)
    mod = testModule()
    output = mod(t)
    output[0].backward()
    test = t.grad
    return test

def test_gpu_orig():
    mod = testModule().cuda()
    te = torch.ones([1, 10], requires_grad=True)
    te = te.cuda(device='cuda:0')
    output = mod(te)
    output[0].backward()
    test = te.grad
    return test

def test_gpu_reco():
    mod = testModule().cuda()
    t = torch.ones([1, 10], requires_grad=True, device='cuda:0')
    output = mod(t)
    output[0].backward()
    test = t.grad
    return test

print(test_cpu())

print(test_gpu_orig())

print(test_gpu_reco())

As you can see, test_gpu_orig() returns None, while test_gpu_reco() returns the gradients of output with respect to input, as expected. However, if you simply run the lines defining t and te as above, they will look identical as far as I can see. What is the difference between them?

This is because

te = torch.ones([1, 10], requires_grad=True)

creates the leaf variable (which will receive the grad) and

te = te.cuda(device='cuda:0')

will assign something else (a non-leaf variable) to the name te.
If you do

   te = torch.ones([1, 10], requires_grad=True)
   tec = te.cuda(device='cuda:0')
   output = mod(tec)

instead, you keep the name of the original you’ll get the same as in the reco variant.

Best regards

Thomas