Cuda Out Of Memory in simple operation

Hello!

I’ve been reading previous solved issues with CUDA Out of memory but I haven’t found anything that could help me, that’s why I open this thread.

My code is the following: (y - x).pow(2).div(sigma.mul(2)), and the error I receive is: RuntimeError: CUDA out of memory. Tried to allocate 129.21 GiB (GPU 0; 31.75 GiB total capacity; 230.09 MiB already allocated; 30.26 GiB free; 280.00 MiB reserved in total by PyTorch).

I don’t understand how this operation could try to allocate around 130 GiB. The size of the tensors, obtained using a.element_size()*a.nelement() is 744960 bytes. The simple operation of (y-x) already provokes this error.

Does anyone have any idea about how could this be happening with this tensor sizes? I honestly don’t get it.

Could you post the shapes of both tensors as I would guess that broadcasting is used, which would increase the size?