The following code doesn’t seem to work when I try to pass CUDA tensors around between two processes. I am following as suggested here: Multiprocessing package - torch.multiprocessing — PyTorch 1.10 documentation
import torch as tc
def enqueue(data, queue):
for x in data:
queue.put(x.cuda(non_blocking=True))
def main():
tc.multiprocessing.set_start_method('spawn')
data = [tc.randn(100, 100).share_memory_() for _ in range(100)] # create 100 tensors on shared memory
results = []
queue = tc.multiprocessing.Queue(10) # this is the queue I am going to use to pass cuda tensors around
producer = tc.multiprocessing.Process(target=enqueue, args=(data, queue))
producer.start() # start producing cuda tensors from the child process
while len(results) < len(data):
x = queue.get()
results.append(x.mean())
del x
producer.join()
print(results)
if __name__ == '__main__':
main()
And I am getting the following error from the child process:
Traceback (most recent call last):
File "tmp.py", line 22, in <module>
main()
File "tmp.py", line 15, in main
x = queue.get()
File "/n/nix/tech/store/9fmi8gkrx3sv8l88pzz7hrrxn60hr769/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/n/nix/tech/store/i80m32g1v6r74vcwm4jz789p81g9933l/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 119, in rebuild_cuda_tensor
event_sync_required)
RuntimeError: CUDA error: API call is not supported in the installed CUDA driver
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Am I missing something? Thanks.