I found .cuda() and .cpu() works differently for torch.tensor and nn.Module instance.It makes me really confused…
for a tensor:
a_cuda = torch.tensor([1.0,2.0],device='cuda:0')
a_cpu = a_cuda.cpu()
print('----cuda before-------')
print(a_cuda)
print('----cpu before-------')
print(a_cpu)
print('\n')
a_cuda += 100
print('----cuda after-------')
print(a_cuda)
print('----cpu after-------')
print(a_cpu)
the output is
----cuda before-------
tensor([1., 2.], device='cuda:0')
----cpu before-------
tensor([1., 2.])
----cuda after-------
tensor([101., 102.], device='cuda:0')
----cpu after-------
tensor([1., 2.])
I think it’s fine. but for a network:
net_cuda = Model().cuda()
net_cpu = net_cuda.cpu()
print('----cuda before-------')
for p in net_cuda.parameters():
print(p)
print('----cpu before-------')
for p in net_cpu.parameters():
print(p)
print('\n\n')
for p in net_cuda.parameters():
with torch.no_grad():
p += 100
print('----cuda after-------')
for p in net_cuda.parameters():
print(p)
print('----cpu after-------')
for p in net_cpu.parameters():
print(p)
the output is:
----cuda before-------
Parameter containing:
tensor([[ 0.0328, -0.0506]], requires_grad=True)
----cpu before-------
Parameter containing:
tensor([[ 0.0328, -0.0506]], requires_grad=True)
----cuda after-------
Parameter containing:
tensor([[100.0328, 99.9494]], requires_grad=True)
----cpu after-------
Parameter containing:
tensor([[100.0328, 99.9494]], requires_grad=True)
if .cpu() makes a copy of the original data, why it changes after I modify the original cuda data?