Hi everyone, what is the best practice to share a massive CPU tensor over multiple processes (read-only + single machine + DDP)?
I think torch.Storage — PyTorch 1.9.0 documentation with
share=True suited my needs, but I can’t find a way to save storage and read it as a tensor. (same open issue on Oct 29, 2019 )
I also tried to copy training data to
/dev/shm (reference) and run DDP with 8 GPUs, but nothing is different. The memory usage when running with 8 GPUs is the same as before, but I tested with a single process, loading the dataset may occupy about 1 GB of memory. Am I missing something here?
Thanks for posting the question @siahuat0727, did you try
torch.multiprocessing.Queue to pass the tensor objects between the processes? you can take a look at the `torch.multiprocessing doc and see if this works for you.
you can also use
tensor.shared_memory() for sharing a big tensor across processes.
Hi @wanchaol, thank you very much for your reply!
tensor.shared_momory should work well in pure multiprocess program.
I will look at how to pass the reference in pytorch-lightning DDP mode.
Finally, I found a way to utilize
For detail, see here.