Dear PyTorch Team,
Please help to find the root cause of the issue described below.
Issue description
Executing the following script using the PyTorch 2.7, CUDA 12.6:
import torch
import torch.multiprocessing as mp
def producer(shared_queue, producer_done, consumer_done):
for i in range(1, 10):
t = torch.tensor([i, i*2], device="cuda:0")
print(f"#{i}: Tensor to send: {t}")
shared_queue.put(t)
producer_done.set()
consumer_done.wait()
def consumer(shared_queue, producer_done, consumer_done):
producer_done.wait()
for i in range(1, 10):
t = shared_queue.get()
print(f"#{i}: Tensor received: {t}")
consumer_done.set()
if __name__ == "__main__":
mp.set_start_method("spawn", True)
shared_queue = torch.multiprocessing.Queue()
consumer_done = mp.Event()
producer_done = mp.Event()
producer_process = mp.Process(target=producer, args=(shared_queue, producer_done, consumer_done))
consumer_process = mp.Process(target=consumer, args=(shared_queue, producer_done, consumer_done))
producer_process.start()
consumer_process.start()
producer_process.join()
consumer_process.join()
constantly results in the same output on both Windows and WSL2 (Ubuntu 24.04):
#1: Tensor to send: tensor([1, 2], device='cuda:0')
#2: Tensor to send: tensor([0, 0], device='cuda:0')
#3: Tensor to send: tensor([3, 6], device='cuda:0')
#4: Tensor to send: tensor([4, 8], device='cuda:0')
#5: Tensor to send: tensor([ 5, 10], device='cuda:0')
#6: Tensor to send: tensor([ 6, 12], device='cuda:0')
#7: Tensor to send: tensor([ 7, 14], device='cuda:0')
#8: Tensor to send: tensor([ 8, 16], device='cuda:0')
#9: Tensor to send: tensor([ 9, 18], device='cuda:0')
#1: Tensor received: tensor([0, 0], device='cuda:0')
#2: Tensor received: tensor([0, 0], device='cuda:0')
#3: Tensor received: tensor([3, 6], device='cuda:0')
#4: Tensor received: tensor([4, 8], device='cuda:0')
#5: Tensor received: tensor([ 5, 10], device='cuda:0')
#6: Tensor received: tensor([ 6, 12], device='cuda:0')
#7: Tensor received: tensor([ 7, 14], device='cuda:0')
#8: Tensor received: tensor([ 8, 16], device='cuda:0')
#9: Tensor received: tensor([ 9, 18], device='cuda:0')
The first two received tensors are zeroed, but not only received - also the second one that’s sent is zeroed as well. No issues on native Ubuntu setup.
Some findings
-
Documentation says that Windows FAQ — PyTorch 2.7 documentation sharing CUDA tensors is not supported;
-
There is a comment from peterjc123 that sharing CUDA tensors is not supported on Windows, but that was back 2018.
-
Unit tests for multiprocessing are disabled on Windows: pytorch/test/test_multiprocessing.py at 56b03df6ac5b4185a2b7b92f253565500a5b51ca · pytorch/pytorch · GitHub
-
Open topics that refer this issue:
Torch.multiprocessing on CUDA turns tensors to zeros
PyTorch multiprocessing with CUDA sets tensors to 0 - #11 by Skirlax
Issue with CUDA tensors shared between processes
Use torch.multiprocessing.queue with cuda tensor
PyTorch multiprocessing not sending CUDA tensors properly on Windows -
Open similar topics:
Best practice to share CUDA tensors across multiprocess
CUDA tensors on multiprocessing queue
DataLoader multiprocessing with Dataset returning a CUDA tensor
Sharing CUDA tensor
DataLoader: is returning CUDA tensors always bad in distributed training?
Synchronization of CUDA operations between `multiprocess` processes
Allocate cuda tensor in subprocess - #5 by florin
Multiprocessing CUDA memory - #12 by PatrickNercessian
Using CUDA IPC memory handles in pytorch - #2 by colesbury
A call to torch.cuda.is_available makes an unrelated multi-processing computation crash? -
Relevant GitHub issues:
torch.multiprocessing subprocess receives tensor with zeros rather than actual data · Issue #1015 · pytorch/examples · GitHub
Cuda tensor is zero when passed through multiprocessing queue · Issue #84994 · pytorch/pytorch · GitHub
Problems with initial communication between GPUs · Issue #56771 · pytorch/pytorch · GitHub
Parameters of cuda module zero out when used in multiprocessing · Issue #109094 · pytorch/pytorch · GitHub
Data corruption when reading data as CUDA tensor from a different process · Issue #134273 · pytorch/pytorch · GitHub
Unexpected behaviour with shared modules in multiprocessing on WSL2 · Issue #112340 · pytorch/pytorch · GitHub
Thank you in advance for your assistance