Copy cpu tensor to gpu without new gpu memory (inplace copy of CPU tensor to GPU tensor)

shotage of GPU memory

When copy a CPU tensor to GPU,

eg

a = torch.ones(100000).cuda()  # 100000 * 4 bytes GPU memory required
b = torch.ones(100000).fill_(2)
a.data = b.cuda()  # 200000 * 4 bytes GPU memory required,but I only have  100000 * 4 bytes GPU memory for the program

New gpu buffer is allocated for a.data = b.cuda(). My GPU memory is very limited. The copy would cause OOM.

Is there any in-place copy of cpu tensor to gpu tensor? The buffer of a in above example is already there. I want a has value of b with no any additional GPU memory allocation.

Use an inplace copy: a.copy_(b).:

a = torch.randn(1024**2 * 5, device="cuda")
print(torch.cuda.memory_allocated() / 1024**2)
# 20.0
print(torch.cuda.max_memory_allocated() / 1024**2)
# 20.0
print(torch.cuda.memory_reserved() / 1024**2)
# 20.0

b = torch.randn(1024**2 * 5)
a.copy_(b)
print(torch.cuda.memory_allocated() / 1024**2)
# 20.0
print(torch.cuda.max_memory_allocated() / 1024**2)
# 20.0
print(torch.cuda.memory_reserved() / 1024**2)
# 20.0

So much thanks to you @ptrblck.
I previously thought b should be in the same device with a when executing a.copy_(b). You save my day :grinning: