Why does the code shown below either finish normally or hang depending on which lines are commented/uncommented, as described in the table below?
Summary of table: if I initialise sufficiently large tensors in both processes without using "spawn"
, the program hangs. I can fix it by making either tensor smaller, or by using "spawn"
.
Note:
- All memory is purely CPU, I don’t even have CUDA installed on this computer
- This issue does not occur if I replace
torch
withnumpy
, even if I make the array size 10x larger - Version information:
Ubuntu 22.04.1 LTS, Python 3.10.12, torch 2.1.2+cpu
Uncommented | Commented | Behaviour |
---|---|---|
(1), (4) | (2), (3), (5) | Hang |
(2), (4) | (1), (3), (5) | OK |
(1), (5) | (2), (3), (4) | OK |
(1), (3), (4) | (2), (5) | OK |
import multiprocessing as mp
import torch
def train():
print("start of train")
x = torch.arange(100000) # (1)
x = torch.arange(10000) # (2)
print("end of train")
if __name__ == "__main__":
mp.set_start_method('spawn') # (3)
x = torch.arange(100000) # (4)
x = torch.arange(10000) # (5)
p = mp.Process(target=train)
p.start()
p.join()