Hi all,
I’m wondering about the backend difference between the following two constructions:
t = torch.ones([1, 10], requires_grad=True, device='cuda:0')
and
te = torch.ones([1, 10], requires_grad=True)
te = te.cuda(device='cuda:0')
The issue being the following example, relating to https://github.com/pytorch/pytorch/issues/7425:
import torch.nn as nn
import torch
class testModule(nn.Module):
def __init__(self):
super(testModule, self).__init__()
self.lin = nn.Linear(10, 1)
def forward(self, x):
return self.lin(x)
def test_cpu():
t = torch.ones([1, 10], requires_grad=True)
mod = testModule()
output = mod(t)
output[0].backward()
test = t.grad
return test
def test_gpu_orig():
mod = testModule().cuda()
te = torch.ones([1, 10], requires_grad=True)
te = te.cuda(device='cuda:0')
output = mod(te)
output[0].backward()
test = te.grad
return test
def test_gpu_reco():
mod = testModule().cuda()
t = torch.ones([1, 10], requires_grad=True, device='cuda:0')
output = mod(t)
output[0].backward()
test = t.grad
return test
print(test_cpu())
print(test_gpu_orig())
print(test_gpu_reco())
As you can see, test_gpu_orig() returns None, while test_gpu_reco() returns the gradients of output with respect to input, as expected. However, if you simply run the lines defining t
and te
as above, they will look identical as far as I can see. What is the difference between them?