import torch as tc
import torch.multiprocessing as mp
def do_something():
b = tc.randn(1000, 1000)
print(f'A tensor is created in process {mp.current_process().name}')
print(b.sum()) # This line hangs forever
def bug():
# a = tc.randn(1000, 1000)
a = tc.ones(1000, 1000) # comment out this line and revive the line above will make the bug go away
p = mp.Process(target=do_something, args=())
p.start()
p.join()
if __name__ == '__main__':
bug()
My torch version is 1.10.2
It is crazy that creating a tensor in parent process causes a deadlock in the child process, and the tensor is not even passed to the child process.
It is even crazier that if I replace the tensor in parent process with anther one, the deadlock goes away!
Is it generally not recommended to use forked subprocess in pytorch?
thanks
I ran your code and did not see any bugs. The code just completed smoothly. My torch is 1.10.0
I’m having a similar problem with python3.10 and torch 2.0/2.2 (didn’t try 2.1), here is my test code
import torch
import multiprocessing
import numpy as np
def test(n):
x=np.zeros((n,n))
print('tensor')
x=torch.tensor(x) # the subprocess stucks here
print('done')
return x
def main():
tensor = torch.zeros(200,1000)
# tensor = torch.zeros(200,100) # this works, seems the bug goes away with small tensor
# neither torch.multiprocessing nor built-in multiprocessing works
# pool=torch.multiprocessing.Pool(1)
pool=multiprocessing.Pool(1)
pool.apply(test,(1000,))
return
main()
Did you figure out a solution, or have any clues?
thanks
edit: The bug seems happens on linux, but goes away on win
edit2: found the cause here