What is the best practice to share a massive CPU tensor over multiple processes (read-only + single machine + DDP)?

Hi everyone, what is the best practice to share a massive CPU tensor over multiple processes (read-only + single machine + DDP)?

I think torch.Storage — PyTorch 1.9.0 documentation with share=True suited my needs, but I can’t find a way to save storage and read it as a tensor. (same open issue on Oct 29, 2019 )

I also tried to copy training data to /dev/shm (reference) and run DDP with 8 GPUs, but nothing is different. The memory usage when running with 8 GPUs is the same as before, but I tested with a single process, loading the dataset may occupy about 1 GB of memory. Am I missing something here?

Thanks for posting the question @siahuat0727, did you try torch.multiprocessing.Queue to pass the tensor objects between the processes? you can take a look at the `torch.multiprocessing doc and see if this works for you.

you can also use tensor.shared_memory() for sharing a big tensor across processes.

Hi @wanchaol, thank you very much for your reply!
I think tensor.shared_momory should work well in pure multiprocess program.
I will look at how to pass the reference in pytorch-lightning DDP mode.

Finally, I found a way to utilize torch.Storage.from_file.
For detail, see here.