The main process in the below code snippet will freeze after serveral iterations.
I think it’s related to how the data structure stack in pytorch works, tensors are built upon storage classes, and storage classes are build upon raw cudaMalloc regions. I can understand what recuce_tensor()
and rebuild_cuda_tensor()
are doing, but I am not sure why creating a new tensor (since g_tensor has been reassigned) after a blocking starmap would somehow causing the main process to freeze.
Code example
import itertools as it
import torch as t
import torch.multiprocessing as mp
def infer(id, tensor):
print(id)
print(tensor)
# del tensor immediately doesn't solve the problem
del tensor
# some global tensor
g_tensor = t.full([1000, 1000], 2, device="cuda:0")
g_tensor.share_memory_()
if __name__ == "__main__":
ctx = mp.get_context("spawn")
pool = ctx.Pool(2)
for i in range(10000000):
print("start")
pool.starmap(infer, zip(range(5), it.repeat(g_tensor)))
# cpu tensors work just fine
# for cuda tensors:
# if I delete the global tensor, reassign it with a new cuda tensor
# or if I use a tensor created dynamically in each iteration
# the program freezes after 2 iterations.
# Comment out the following lines and everything will work fine.
del g_tensor
g_tensor = t.full([1000, 1000], 2, device="cuda:0")
g_tensor.share_memory_()
Environment
- PyTorch Version (e.g., 1.0): 1.1.0
- OS (e.g., Linux): Linux
- How you installed PyTorch (conda, pip, source): pip
- Python version: 3.5
- CUDA/cuDNN version: 9.1/7.2.1