Windows & WSL2: zeroed CUDA tensors in spawned processes

Dear PyTorch Team,

Please help to find the root cause of the issue described below.

Issue description

Executing the following script using the PyTorch 2.7, CUDA 12.6:

import torch
import torch.multiprocessing as mp

def producer(shared_queue, producer_done, consumer_done):
    for i in range(1, 10):
        t = torch.tensor([i, i*2], device="cuda:0")
        print(f"#{i}: Tensor to send: {t}")
        shared_queue.put(t)
    producer_done.set()
    consumer_done.wait()

def consumer(shared_queue, producer_done, consumer_done):
    producer_done.wait()
    for i in range(1, 10):
        t = shared_queue.get()
        print(f"#{i}: Tensor received: {t}")

    consumer_done.set()

if __name__ == "__main__":

    mp.set_start_method("spawn", True)

    shared_queue = torch.multiprocessing.Queue()
    consumer_done = mp.Event()
    producer_done = mp.Event()

    producer_process = mp.Process(target=producer, args=(shared_queue, producer_done, consumer_done))
    consumer_process = mp.Process(target=consumer, args=(shared_queue, producer_done, consumer_done))
    
    producer_process.start()
    consumer_process.start()   
    producer_process.join()
    consumer_process.join()

constantly results in the same output on both Windows and WSL2 (Ubuntu 24.04):

#1: Tensor to send: tensor([1, 2], device='cuda:0')
#2: Tensor to send: tensor([0, 0], device='cuda:0')
#3: Tensor to send: tensor([3, 6], device='cuda:0')
#4: Tensor to send: tensor([4, 8], device='cuda:0')
#5: Tensor to send: tensor([ 5, 10], device='cuda:0')
#6: Tensor to send: tensor([ 6, 12], device='cuda:0')
#7: Tensor to send: tensor([ 7, 14], device='cuda:0')
#8: Tensor to send: tensor([ 8, 16], device='cuda:0')
#9: Tensor to send: tensor([ 9, 18], device='cuda:0')
#1: Tensor received: tensor([0, 0], device='cuda:0')
#2: Tensor received: tensor([0, 0], device='cuda:0')
#3: Tensor received: tensor([3, 6], device='cuda:0')
#4: Tensor received: tensor([4, 8], device='cuda:0')
#5: Tensor received: tensor([ 5, 10], device='cuda:0')
#6: Tensor received: tensor([ 6, 12], device='cuda:0')
#7: Tensor received: tensor([ 7, 14], device='cuda:0')
#8: Tensor received: tensor([ 8, 16], device='cuda:0')
#9: Tensor received: tensor([ 9, 18], device='cuda:0')

The first two received tensors are zeroed, but not only received - also the second one that’s sent is zeroed as well. No issues on native Ubuntu setup.

Some findings

Thank you in advance for your assistance