I’m working on an app that uses many tensor operations; some of which are performed on batches of image tensors.
To summarize briefly, I pass a batch of 32 images to a function, extract some features from them, and then fill in a new zero-tensor with the results. This new, filled-in results tensor then has some additional transformations done on it; which are done on GPU.
My environment:
- p2.xlarge EC2 instance
- python 3.6.9
- torch 1.8.1
On the second batch, the code crashes with the following error:
RuntimeError: CUDA out of memory. Tried to allocate 1.48 GiB (GPU 0; 11.17 GiB total capacity; 8.15 GiB already allocated; 1.46 GiB free; 9.27 GiB reserved in total by PyTorch)
This is surprising, given the operations being performed should not be too resource-intensive.
I ran the torch profiler on my code, and printed the results by CUDA memory usage to see what is causing me to run out of memory:
----------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Name Self CPU % Self CPU CPU total % CPU total CPU time avg CPU Mem Self CPU Mem CUDA Mem Self CUDA Mem # of Calls
----------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
aten::empty_strided 1.51% 7.888ms 1.51% 7.888ms 108.056us 986.79 Mb 986.79 Mb 5.75 Gb 5.75 Gb 73
aten::to 0.13% 662.014us 71.84% 374.072ms 5.055ms 986.79 Mb 0 b 3.90 Gb 0 b 74
aten::empty 1.16% 6.021ms 1.16% 6.021ms 59.616us 0 b 0 b 3.21 Gb 3.21 Gb 101
aten::mul 0.02% 121.185us 0.95% 4.954ms 1.651ms 0 b 0 b 2.97 Gb 0 b 3
aten::gather 0.02% 86.385us 0.63% 3.255ms 3.255ms 0 b 0 b 1.48 Gb 0 b 1
aten::resize_ 0.61% 3.158ms 0.61% 3.158ms 3.158ms 0 b 0 b 1.48 Gb 1.48 Gb 1
aten::add 0.62% 3.243ms 0.62% 3.243ms 3.243ms 0 b 0 b 1.48 Gb 1.48 Gb 1
aten::rsub 0.02% 82.626us 0.35% 1.843ms 1.843ms 0 b 0 b 760.00 Mb 0 b 1
aten::div 0.01% 45.086us 0.01% 55.337us 55.337us 0 b 0 b 759.38 Mb 0 b 1
aten::empty_like 0.03% 178.903us 0.14% 748.427us 22.680us 0 b 0 b 440.99 Mb 0 b 33
----------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------
Two of the worst offenders are the empty_strided
and empty
functions, which together consume about 9GB of memory (75% of my GPU memory).
I am not using either or these functions anywhere in my code. I have absolutely no idea why these functions are apparently being called so many times, when I haven’t written a single call to either or them in my code.
Does anyone have any idea why this is happening, and how I can fix it?