Memory leak when using RPC for pipeline parallelism

Hi Yi Wang,

Thanks for your reply and this useful link! I found this memory leak problem is related to num_worker_threads=256 in options = rpc.TensorPipeRpcBackendOptions(num_worker_threads=256). According to your suggestion and the link, I first add:

for obj in gc.get_objects():
            try:
                if torch.is_tensor(obj) or (hasattr(obj, 'data') and torch.is_tensor(obj.data)):
                    print(f"After forward: {type(obj)}, {obj.size()}")
            except: pass

to trace the tensor and add del ... to make sure there are the same amount of tensors between different forward passes. But the memory still keeps growing.

However, when I reduce the num_worker_threads=256 in options = rpc.TensorPipeRpcBackendOptions(num_worker_threads=256) to num_worker_threads=2, the memory stops growing. Further, the memory shows significant growing only when I set num_worker_threads > 6.

I could only find little information about this argument in the PyTorch document:

  • num_worker_threads (int, optional) – The number of threads in the thread-pool used by TensorPipeAgent to execute requests (default: 16).

And the tutorial for pipeline parallelism set it to 128, which I thought was related to the communication/request speed. I am not familiar with it, and wonder:

  1. Why does larger num_worker_threads cause the memory leak problem?
  2. When do we need a larger num_worker_threads?

Do you have any related experience or suggestions? I need your help! Thank you very much!

Best,
Yang