`Exception: process 0 terminated with exit code 1` error when using `torch.multiprocessing.spawn` to parallelize over multiple GPUs

then you may use a for-loop to divide X_prime into smaller chunks, just as what your old code was doing, just don’t split them too fine, like into row-by-row operations.

Time for space, or space for time.

2 Likes

You’re right! Split X_prime into chunks and run each chunk one at a time, rather than run all chunks in parallel which is what I have in my old code. Genius!

Once again, a massive thank you @iffiX, couldn’t do it without you!

Hi @mrshenli, does the data need to be encoded in one side, pushed into a pipe, and decoded on the other side? I have tried this way to put some big graphs in the queue, and the get() function takes pretty long time. Is this a good way to share big data?

It took me quite some time to figure it out. Hopefully this is useful for someone.
In my case I am using Jupyter Notebook which was causing the exact same error. I wrote a small blog post about it with code examples to a fully working PyTorch Multiprocessing with Queue.
https://dataiskey.eu/jupyter-notebook-pytorch-multiprocessing/
Github with just the code example:
https://github.com/olympus999/jupyter-notebook-pytorch-multiprocessing-queue