Blocked multi-thread if use a tensor in main thread

MoriartyShan · January 18, 2022, 7:22am

Code likes below. It will block at creating x1. If you remove line print("finish create\n", x), it works fine.
This really bothers me.
Thanks if anyone helps

import torch
import torch.multiprocessing as multiprocessing

def run():
  print("run")
  device=torch.device("cpu")
  x1 = torch.rand(720, 1280, dtype=torch.float32, device=device)
  print('a', x1)
  x2 = x1.to(dtype=torch.float64)
  print('b', x2.dtype)

x = torch.rand(1080, 1920,device=torch.device("cpu"))
print("finish create\n", x)

thread = multiprocessing.Process(target=run)
thread.start()
print("wait")
thread.join()
print("finish")

anantguptadbl · January 18, 2022, 2:11pm

@MoriartyShan

The issue was with this line

print('a', x1)

If you convert this to

print('a', x1.detach().numpy())

I am not sure why this is happening. We need to get to the bottom of this. But printing of tensors within multiprocessing function caused your issue

MoriartyShan · January 19, 2022, 4:20am

Thanks for your advice.
But after changing code as yours, it blocked at x2 = x1.to(dtype=torch.float64). The line next it never executed.
I think there may be some setting in pytorch causes it.
And if change device in run to 'cuda:0', the problem disappears. But this really not what I want.

anantguptadbl · January 19, 2022, 8:33am

@MoriartyShan you are correc. This is weird behavior. For the time being, following is the workaround, but i will try to get to the bottom of this. It will be a good exercise

import torch
import torch.multiprocessing as multiprocessing
import copy 

def run():
    print("run")
    device=torch.device("cpu")
    print(torch.__version__)
    x1 = torch.rand(720, 1280, dtype=torch.float32, device=device)
    x2 = torch.from_numpy(x1.detach().numpy().astype(np.float64))
    print('b', x2.dtype)

x = torch.rand(1080, 1920,device=torch.device("cpu"))
print("finish create\n", x)

thread = multiprocessing.Process(target=run)
thread.start()
print("wait")
thread.join()
print("finish")

MoriartyShan · January 19, 2022, 8:50am

Thanks.
The example is a simplified code from my work. Changing tensor to numpy array can truly solve the problem above. Hope there is some way to work with pytorch in another thread.
I will try figure it out also.