I have two tensors, a
and b
, and I want to subtract them in CUDA, inside a neural network evaluation.
Each of these tensors contain ~1e5 float32 elements, specifically with shape torch.Size([161858])
and 0.647432 Mb each. When I try to subtract them in a naive way:
c = a - b
I get the following out of memory error:
RuntimeError: CUDA out of memory. Tried to allocate 97.60 GiB (GPU 0; 15.75 GiB total capacity; 52.60 MiB already allocated; 11.73 GiB free; 82.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Using torch.subtract
I get the same.
However, I can avoid the memory issue through the following (less efficient) method:
c = torch.zeros(a.shape[0],device=self.device)
for i in range(a.shape[0]):
c[i] = a[i]-b[i]
What’s going on here? Why CUDA tries to allocate 97GB, when each of the tensors are less than 1 Mb? And why the latter method works, while the former doesn’t?
Thank you in advance,
Pablo.