I have to run my U-net model on a large 3D image, so I chop it into smaller portions, run it individually, and then put them together.
code_1.py chops the large 3D image into smaller portions and calls code_2.py inside a for loop where each call contains a small piece of the (chopped) image.
Everything works fine when I call code_2.py inside code_1.py as:
for small_img in large_img_list:
torch.cuda.empty_cache()
os.system('%s %s %s' % ('python3', 'code_2.py', small_img))
However, when I put the code of code_2.py into a function inside the same file (code_2.py) and import this file into code_1.py and do function calls, I run out of CUDA memory. The code is below.
# This is code_1.py
from code_2 import *
for small_img in large_img_list:
torch.cuda.empty_cache()
RunModels(small_img)
Why is this? I clear every tensor I allocate inside code_2.py
torch.cuda.empty_cache()
del image
del mask
Maybe this is a python problem rather than PyTorch problem?
Your import might be already executing some code and maybe even initializing a (separate) CUDA context. You could check if my adding print statements to the methods on code_2 and also check the memory usage right after the import.
Thank you for your answer. You are right, when I import code_2.py, it does execute lines in the global context inside code_2.py which is understandable but it doesn’t obviously execute the function RunModels() which is inside code_2.py and doing all the work.
However, when I print the memory allocation, it doesn’t allocate any memory right before and after import.
t = torch.cuda.get_device_properties(0).total_memory
r = torch.cuda.memory_reserved(0)
a = torch.cuda.memory_allocated(0)
f = r-a # free inside reserved
print("\ttotal_memory\t", t)
print("\tmemory_reserved\t", r)
print("\tmemory_allocated\t", a)
print("\tfree inside reserved\t", f, "\n\n")