Sending a tensor to multiple GPUs

I have a DataParallel model with a tensor attribute I need to define after I wrap the model with DataParallel. While the model has cuda device_ids = [0, 1] as expected, the tensor I assign to the model has device cuda:0 only, so it is not copied to all devices when I send it to the model. Is it possible to have this tensor available in both devices?

To copy the tensor to the GPU you can use - data = data.to(device)

Details at - https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html

That would send the data from one GPU to another, though. I want the tensor to be on both GPUs, cuda:0 and cuda:1, just like the rest of the DataParallel model.

Where do you create this tensor? Inside the wrapped model’s forward? Then assuming that x is the input to the model then you should use x.new_tensor() or x.new_zeros() or torch.zeros_like() to create the tensor automatically in the correct device.

See here: https://pytorch.org/docs/stable/tensors.html#torch.Tensor.new_zeros

If you are assigning the attribute from the outside e.g model.tensor = some_tensor then this is not going to work. The model is replicated to each device after each iteration. Doing model.tensor does not assign the attribute to the real model inside the wrapper but directly to the DataParallel instance.

What I am doing is in every batch I assign the tensor (named graph) to the module wrapped in DataParallel based on a criterion. This tensor is different for different batches and should not be parallelized, which is why I am trying to assign it directly to the model:

        for batch_idx, (x, y, graph, subject) in enumerate(self.train_loader):

            if model.module.subject != subject:

                model.module.subject = subject
                model.module.graph = graph      # where the assignment takes place

            output = model(x.to(self.device))
            target = torch.argmax(y, dim=1)

            optimizer.zero_grad()
            loss = F.nll_loss(output, target, weight=self.w)
            loss.backward()
            optimizer.step()

Given that I assign the tensor to the module wrapped by DataParallel, I would expect this tensor to be replicated in each GPU as well, which does not seem to be the case. This might have to do with the replication process:

When exactly does the replication happen? And if this happens in every iteration, shouldn’t the model replicate this tensor as well?

Try using self.register_buffer('graph', None) inside __init__ of the model. This way DataParallel knows that this is a tensor that must be copied too. DataParallel only replicates parameters and buffers.

1 Like

I think that solves it, thank you!

One issue is even though the outputs and the subsequent loss is now calculated, the code now gets stuck on the backward pass. Not entirely sure whether this issue is directly related, though I do not encounter it in the single-GPU case.

That could be an issue with your setup. Maybe hardware. I don’t think it is related to pytorch