CUDA out of memory with adding operation

Hello everyone,

I’ve recently encountered a CUDA out of memory issue in my project when an “adding operation” is performed. I have no idea why “adding” will cause so much memory cost.
As two tensors only cost ~1MB.


Do you have any suggestions on how to debug this issue?

Check the shapes of both tensors as they might be broadcasted as seen here:

x = torch.randn(1024**3//4, 1, device="cuda")
y = torch.randn(1, 1024**3//4, device="cuda")
print(torch.cuda.memory_allocated()/1024**3)
# 2.0

z = x + y
# OutOfMemoryError: CUDA out of memory. Tried to allocate 268435456.00 GiB

Thank you so much! Indeed the size is not the same. The first tensor is of size (100000,3), the second tensor is of size (100000,1,3). I never thought it’s due to broadcasting.