DataParallel converts tensor values to zero

Dear all,

I am using Torch 1.13.1 to run this simple code:

import torch


class DataGen(torch.utils.data.Dataset):

    def __init__(self, n, L):
        super(DataGen, self).__init__()
        self.n = n
        self.L = L

    def __getitem__(self, item):
        return {'data': torch.randn(self.n)}

    def __len__(self):
        return self.L


class Network(torch.nn.Module):

    def __init__(self, n):
        super(Network, self).__init__()
        self.layer = torch.nn.Linear(in_features=n, out_features=1)

    def forward(self, x):
        print(f'\tData processed:{x}')
        return self.layer(x['data'])


# Parameters
n = 2
L = 12
batch_size = 4

# Data
dataset = DataGen(n, L)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size)

# Device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device = torch.device(device)

# Model
model = Network(n).to(device)
if torch.cuda.device_count() > 1:
    model = torch.nn.DataParallel(model)

# Train
model.train()
for i, batch in enumerate(dataloader):
    batch = {k: v.to(device) for k, v in batch.items()}
    print(f'\nBatch {i}:\n\tData loaded:{batch}')
    pred = model(batch)

It prints this:

Batch 0:
	Data loaded:{'data': tensor([[ 0.0313,  0.7014],
        [-0.1613, -1.0289],
        [ 0.4327,  0.4148],
        [ 1.2195, -0.8426]], device='cuda:0')}
	Data processed:{'data': tensor([[0., 0.],
        [0., 0.]], device='cuda:1')}
	Data processed:{'data': tensor([[ 0.0313,  0.7014],
        [-0.1613, -1.0289]], device='cuda:0')}
Batch 1:
	Data loaded:{'data': tensor([[-0.3293,  2.3024],
        [-0.4908, -1.0065],
        [-0.4675, -0.1143],
        [ 0.0790,  0.0789]], device='cuda:0')}
	Data processed:{'data': tensor([[0., 0.],
        [0., 0.]], device='cuda:1')}
	Data processed:{'data': tensor([[-0.3293,  2.3024],
        [-0.4908, -1.0065]], device='cuda:0')}
Batch 2:
	Data loaded:{'data': tensor([[ 0.2400, -0.3636],
        [ 1.8705, -1.0880],
        [-1.5622, -1.8931],
        [-0.5770,  0.0298]], device='cuda:0')}
	Data processed:{'data': tensor([[0., 0.],
        [0., 0.]], device='cuda:1')}
	Data processed:{'data': tensor([[ 0.2400, -0.3636],
        [ 1.8705, -1.0880]], device='cuda:0')}

Note that when the original batch is splitted by DataParallel, the values of the split asigned to cuda:1 become 0. Why is this happening?

Thanks in advance

Your system has most likely issues transferring data to GPU1. Check it manually by sending tensors from GPU0 to GPU1 in a loop. If these tensors also contain unexpected values check if IOMMU needs to be disabled or if p2p Is not supported in your setup and should also be disabled.

Thanks for your quick answer. I have tried this:

for i, batch in enumerate(dataloader):

   # Batch in GPU 0
   device = torch.device('cuda:0')
   batch = {k: v.to(device) for k, v in batch.items()}
   print(f'\nBatch {i}:\n\tBatch in GPU 0:\n\t{batch}')

   # Move batch to GPU 1
   device = torch.device('cuda:1')
   batch = {k: v.to(device) for k, v in batch.items()}
   print(f'\tBatch in GPU 1:\n\t{batch}')

   # Move batch back to GPU 0
   device = torch.device('cuda:0')
   batch = {k: v.to(device) for k, v in batch.items()}
   print(f'\tBatch in GPU 0:\n\t{batch}')

And the output was:

Batch 0:
	Batch in GPU 0:
	{'data': tensor([[ 0.2519, -0.3811],
        [ 1.4344, -0.1266],
        [-0.0730,  0.2266],
        [ 0.5903,  1.0903]], device='cuda:0')}
	Batch in GPU 1:
	{'data': tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], device='cuda:1')}
	Batch in GPU 0:
	{'data': tensor([[ 0.2519, -0.3811],
        [ 1.4344, -0.1266],
        [-0.0730,  0.2266],
        [ 0.5903,  1.0903]], device='cuda:0')}
Batch 1:
	Batch in GPU 0:
	{'data': tensor([[-0.2343,  0.8752],
        [ 0.4204, -0.8245],
        [ 0.1555,  1.6178],
        [-0.6139, -0.4224]], device='cuda:0')}
	Batch in GPU 1:
	{'data': tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], device='cuda:1')}
	Batch in GPU 0:
	{'data': tensor([[-0.2343,  0.8752],
        [ 0.4204, -0.8245],
        [ 0.1555,  1.6178],
        [-0.6139, -0.4224]], device='cuda:0')}
Batch 2:
	Batch in GPU 0:
	{'data': tensor([[ 0.3432,  1.4844],
        [ 0.0526, -1.9575],
        [ 0.4190,  0.3634],
        [-1.1192,  1.8027]], device='cuda:0')}
	Batch in GPU 1:
	{'data': tensor([[0., 0.],
        [0., 0.],
        [0., 0.],
        [0., 0.]], device='cuda:1')}
	Batch in GPU 0:
	{'data': tensor([[ 0.3432,  1.4844],
        [ 0.0526, -1.9575],
        [ 0.4190,  0.3634],
        [-1.1192,  1.8027]], device='cuda:0')}

So you are right, but to disable IOMMU I guess I have to access the BIOS, which is not possible until monday since I am remotely connected to my PC. In any case, how can I check if both IOMMU and p2p need to be disabled?

By the way, I am using 2 x 4090 GPUs on Ubuntu 22.04.

You could run your test script via NCCL_P2P_DISABLE=1 python script.py args and check if the values would be transferred correctly.
If this helps you might need to update your NVIDIA driver.

It still does not transfer the values correctly. Besides, my nvidia drivers are updated to version 545.23.06, which I think is the last one

In this case you might need to wait for next week and check IOMMU as described here.

Ok, thanks for your help!

Update: I tried disabling IOMMU but the problem still persisted. Downgrading nvidia-driver to 535.129.03 solved the problem.