I am trying to send a CUDA tensor from main process to another process, and below is the minimum example:
import torch
def process_1(pipe: torch.multiprocessing.Queue, event: torch.multiprocessing.Event):
a = torch.rand([100, 100], device='cuda', dtype=torch.float)
print(f'tensor sent: {a}')
print(torch.sum(a))
pipe.put(a)
event.wait()
event.clear()
print(f'tensor after del: {a}')
if __name__ == '__main__':
pipe = torch.multiprocessing.Queue()
event = torch.multiprocessing.Event()
p = torch.multiprocessing.Process(target=process_1, args=(pipe, event, ))
p.start()
recv = pipe.get()
print(f'tensor received: {recv}')
print(torch.sum(recv))
del recv
event.set()
I tried to run this code on Windows, but I found that CUDA tensors that went through the shared memory (Pipe, Queue, etc.) will be turned to all-zero tensors.
tensor sent: tensor([[0.0738, 0.4750, 0.2477, ..., 0.6408, 0.8999, 0.3425],
[0.9697, 0.0946, 0.6554, ..., 0.6812, 0.9557, 0.5535],
[0.0681, 0.4022, 0.7647, ..., 0.1023, 0.1328, 0.1847],
...,
[0.8344, 0.9620, 0.5390, ..., 0.2282, 0.6173, 0.3060],
[0.1959, 0.5154, 0.5861, ..., 0.3451, 0.1385, 0.2135],
[0.3778, 0.0317, 0.0770, ..., 0.6761, 0.7165, 0.1330]],
device='cuda:0')
tensor(4990.6367, device='cuda:0')
tensor received: tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], device='cuda:0')
tensor(0., device='cuda:0')
tensor after del: tensor([[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]], device='cuda:0')
I suspect this is because of issues related to Windows.
Any remedies now? Or do I have to switch to the Linux version of PyTorch?