What is the best practice to share a massive CPU tensor over multiple processes (read-only + single machine + DDP)?

Finally, I found a way to utilize torch.Storage.from_file.
For detail, see here.