Some confusion about torch.multiprocessing.spawn in pytorch

I really get confused when I use the function torch.multiprocessing.spawn. Consider the following code:

import torch
import torch.multiprocessing as mp


x = [1, 2]

def f(id, a):
    print(x)
    print(a)

if __name__ == '__main__':
    x.append(3)
    mp.spawn(f, nprocs=2, args=(x, ))

For any process the main function spwans, it outputs the following:

[1, 2]
[1, 2, 3]

I have the following questions:
(1) Why is the first line of output [1, 2]? I think x is a global variable, and fork new process will share the memory in linux, which follows this page: https://stackoverflow.com/questions/5983159/python-multiprocessing-arguments-deep-copy
(2) Are the parameters in spawn deep copied to the new processes? Or just pass a reference?

Thank you very much!

I have the exact same issue with torch.multiprocessing.spawn (mp.spawn) used for distributed parallel training.

Since I have a large dataset of csv files which i convert to a shared multiprocessing numpy array object to avoid memory leak outside of my main. but mp.spwan It makes multiple copies of it anyways.

1 Like

It looks like python’s multiprocessing module also copies the data if we use a spawn start_method:

import multiprocessing as mp
x = [1, 2]
def foo(a):
    print(x)
    print(a)

if __name__ == '__main__':
   mp.set_start_method("spawn")
    x.append(3)
    p = mp.Process(target=foo, args=(x,))
    p.start()
    p.join()

To answer your question, there is a deepcopy, though shared memory will be used for Tensor data.

From the pytorch multiprocessing “best practices” page (https://pytorch.org/docs/stable/notes/multiprocessing.html):

We recommend using python:multiprocessing.Queue for passing all kinds of PyTorch objects between processes. It is possible to e.g. inherit the tensors and storages already in shared memory, when using the fork start method, however it is very bug prone and should be used with care, and only by advanced users. Queues, even though they’re sometimes a less elegant solution, will work properly in all cases.

You could thus use the fork start method with pytorch multiprocessing to avoid the copy, though as the docs mention this is not supported.

To answer your question, there is a deepcopy, though shared memory will be used for Tensor data.

Is this documented somewhere? Couldn’t find it on [1] and [2], but observed this behavior by writing some simple script

[1] Multiprocessing best practices — PyTorch 2.1 documentation
[2] Multiprocessing package - torch.multiprocessing — PyTorch 2.1 documentation