CUDA out of memory with adding operation

kai422 · February 2, 2024, 2:08pm

Hello everyone,

I’ve recently encountered a CUDA out of memory issue in my project when an “adding operation” is performed. I have no idea why “adding” will cause so much memory cost.
As two tensors only cost ~1MB.

Do you have any suggestions on how to debug this issue?

ptrblck · February 2, 2024, 2:22pm

Check the shapes of both tensors as they might be broadcasted as seen here:

x = torch.randn(1024**3//4, 1, device="cuda")
y = torch.randn(1, 1024**3//4, device="cuda")
print(torch.cuda.memory_allocated()/1024**3)
# 2.0

z = x + y
# OutOfMemoryError: CUDA out of memory. Tried to allocate 268435456.00 GiB

kai422 · February 3, 2024, 2:11am

Thank you so much! Indeed the size is not the same. The first tensor is of size (100000,3), the second tensor is of size (100000,1,3). I never thought it’s due to broadcasting.