Why .cuda() or .cpu() behaves differently for torch.tensor and torch.nn.Module instance

Qianfeng · March 25, 2021, 12:27pm

I found .cuda() and .cpu() works differently for torch.tensor and nn.Module instance.It makes me really confused…

for a tensor:

a_cuda = torch.tensor([1.0,2.0],device='cuda:0')
a_cpu = a_cuda.cpu()
print('----cuda before-------')
print(a_cuda)

print('----cpu before-------')
print(a_cpu)

print('\n')
a_cuda += 100

print('----cuda after-------')
print(a_cuda)

print('----cpu after-------')
print(a_cpu)

the output is

----cuda before-------
tensor([1., 2.], device='cuda:0')
----cpu before-------
tensor([1., 2.])


----cuda after-------
tensor([101., 102.], device='cuda:0')
----cpu after-------
tensor([1., 2.])

I think it’s fine. but for a network:

net_cuda = Model().cuda()
net_cpu = net_cuda.cpu()
print('----cuda before-------')
for p in net_cuda.parameters():
    print(p)

print('----cpu before-------')
for p in net_cpu.parameters():
    print(p)

print('\n\n')
for p in net_cuda.parameters():
    with torch.no_grad():
        p += 100

print('----cuda after-------')
for p in net_cuda.parameters():
    print(p)

print('----cpu after-------')
for p in net_cpu.parameters():
    print(p)

the output is:

----cuda before-------
Parameter containing:
tensor([[ 0.0328, -0.0506]], requires_grad=True)
----cpu before-------
Parameter containing:
tensor([[ 0.0328, -0.0506]], requires_grad=True)



----cuda after-------
Parameter containing:
tensor([[100.0328,  99.9494]], requires_grad=True)
----cpu after-------
Parameter containing:
tensor([[100.0328,  99.9494]], requires_grad=True)

if .cpu() makes a copy of the original data, why it changes after I modify the original cuda data?

Qianfeng · March 25, 2021, 12:35pm

and if I execute net_cpu = net_cuda.cpu() , the net_cuda's parameter will be on cpu…

ptrblck · March 26, 2021, 8:59am

The cuda(), cpu(), and to() operations work recursively (inplace) in nn.Modules, so

net_cpu = net_cuda.cpu()

will move net_cuda to the CPU as well and create a reference to it as net_cpu.
If you want to keep a model on the GPU and a clone on the CPU, you could copy.deepcopy it.