Sharing CUDA tensor

The following code doesn’t seem to work when I try to pass CUDA tensors around between two processes. I am following as suggested here: Multiprocessing package - torch.multiprocessing — PyTorch 1.10 documentation

import torch as tc

def enqueue(data, queue):
    for x in data:
        queue.put(x.cuda(non_blocking=True))

def main():
    tc.multiprocessing.set_start_method('spawn')
    data = [tc.randn(100, 100).share_memory_() for _ in range(100)]  # create 100 tensors on shared memory
    results = []
    queue = tc.multiprocessing.Queue(10)  # this is the queue I am going to use to pass cuda tensors around
    producer = tc.multiprocessing.Process(target=enqueue, args=(data, queue))
    producer.start()  # start producing cuda tensors from the child process 
    while len(results) < len(data):
        x = queue.get()
        results.append(x.mean())
        del x
    producer.join()
    print(results)

if __name__ == '__main__':
    main()

And I am getting the following error from the child process:

Traceback (most recent call last):
  File "tmp.py", line 22, in <module>
    main()
  File "tmp.py", line 15, in main
    x = queue.get()
  File "/n/nix/tech/store/9fmi8gkrx3sv8l88pzz7hrrxn60hr769/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/n/nix/tech/store/i80m32g1v6r74vcwm4jz789p81g9933l/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 119, in rebuild_cuda_tensor
    event_sync_required)
RuntimeError: CUDA error: API call is not supported in the installed CUDA driver
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Am I missing something? Thanks.

Python version: 3.7.6
Pytorch version: 1.10.2

I found this issue related to my CUDA driver. Problem solved.