CUDA memory error caused by `torch.empty_strided` -- which I'm not using anywhere

I’m working on an app that uses many tensor operations; some of which are performed on batches of image tensors.

To summarize briefly, I pass a batch of 32 images to a function, extract some features from them, and then fill in a new zero-tensor with the results. This new, filled-in results tensor then has some additional transformations done on it; which are done on GPU.

My environment:

  • p2.xlarge EC2 instance
  • python 3.6.9
  • torch 1.8.1

On the second batch, the code crashes with the following error:
RuntimeError: CUDA out of memory. Tried to allocate 1.48 GiB (GPU 0; 11.17 GiB total capacity; 8.15 GiB already allocated; 1.46 GiB free; 9.27 GiB reserved in total by PyTorch)

This is surprising, given the operations being performed should not be too resource-intensive.

I ran the torch profiler on my code, and printed the results by CUDA memory usage to see what is causing me to run out of memory:

-----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem      CUDA Mem  Self CUDA Mem    # of Calls  
-----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
    aten::empty_strided         1.51%       7.888ms         1.51%       7.888ms     108.056us     986.79 Mb     986.79 Mb       5.75 Gb       5.75 Gb            73  
               aten::to         0.13%     662.014us        71.84%     374.072ms       5.055ms     986.79 Mb           0 b       3.90 Gb           0 b            74  
            aten::empty         1.16%       6.021ms         1.16%       6.021ms      59.616us           0 b           0 b       3.21 Gb       3.21 Gb           101  
              aten::mul         0.02%     121.185us         0.95%       4.954ms       1.651ms           0 b           0 b       2.97 Gb           0 b             3  
           aten::gather         0.02%      86.385us         0.63%       3.255ms       3.255ms           0 b           0 b       1.48 Gb           0 b             1  
          aten::resize_         0.61%       3.158ms         0.61%       3.158ms       3.158ms           0 b           0 b       1.48 Gb       1.48 Gb             1  
              aten::add         0.62%       3.243ms         0.62%       3.243ms       3.243ms           0 b           0 b       1.48 Gb       1.48 Gb             1  
             aten::rsub         0.02%      82.626us         0.35%       1.843ms       1.843ms           0 b           0 b     760.00 Mb           0 b             1  
              aten::div         0.01%      45.086us         0.01%      55.337us      55.337us           0 b           0 b     759.38 Mb           0 b             1  
       aten::empty_like         0.03%     178.903us         0.14%     748.427us      22.680us           0 b           0 b     440.99 Mb           0 b            33  
-----------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  

Two of the worst offenders are the empty_strided and empty functions, which together consume about 9GB of memory (75% of my GPU memory).

I am not using either or these functions anywhere in my code. I have absolutely no idea why these functions are apparently being called so many times, when I haven’t written a single call to either or them in my code.

Does anyone have any idea why this is happening, and how I can fix it?

empty_strided is most likely called internally by another method, e.g. for linalg methods or in the to() operation.
Based on the profile and the number of calls, I would assume it’s indeed the aten::to operation, which uses it and runs out of memory.

1 Like

You’re absolutely right, it was the use of to(). Thanks for your help!