CUDA out of memory in simple subtraction

I have two tensors, a and b, and I want to subtract them in CUDA, inside a neural network evaluation.
Each of these tensors contain ~1e5 float32 elements, specifically with shape torch.Size([161858]) and 0.647432 Mb each. When I try to subtract them in a naive way:

c = a - b 

I get the following out of memory error:

RuntimeError: CUDA out of memory. Tried to allocate 97.60 GiB (GPU 0; 15.75 GiB total capacity; 52.60 MiB already allocated; 11.73 GiB free; 82.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Using torch.subtract I get the same.

However, I can avoid the memory issue through the following (less efficient) method:

c = torch.zeros(a.shape[0],device=self.device)
for i in range(a.shape[0]):
	c[i] = a[i]-b[i]

What’s going on here? Why CUDA tries to allocate 97GB, when each of the tensors are less than 1 Mb? And why the latter method works, while the former doesn’t?

Thank you in advance,

Could you post the exact shapes of a, b, and c? I would guess that some unwanted broadcasting might happen in the subtraction calls.

1 Like

You were right, a had shape torch.Size([161858]) but b had torch.Size([161858,1]). Using b.view(-1), the memory issue was fixed. Thanks!