How to get the current gpu number without the input tensors when using DataParallel

Jason_Zan · February 10, 2023, 3:39pm

Hi,
I’m using DataParallel to do multi-gpu training. I need initialize a torch.zeros tensor in my model forward function, which will be added by several logits (calculated by input tensor).The code is as belows:

        linear_logit = torch.zeros([X.shape[0], 1]).to(self.device)
        ...
        linear_logit += sparse_feat_logit
        ...
        linear_logit += dense_value_logit

The parallel part code:

        model = torch.nn.DataParallel(model, device_ids=[0,1])

Then I got this error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

This is because that the input data and sparse_feat_logit is on cuda:1, but the linear_logit is on cuda:0 (self.device is cuda:0, since the model need to be in cuda:0 before running torch.nn.DataParallel).

I tried to get the current gpu number by input tensors, but I got other errors because some of the input data maybe empty sometimes.

I would like to know how to get the current gpu number without the input tensors. Maybe there is a function or property inside the model? Thank you!

ptrblck · February 11, 2023, 2:16pm

This sounds generally like a valid approach. If the input tensors are empty, you could use a registered parameter and use its device via: next(self.parameters()).device.

Jason_Zan · February 11, 2023, 3:16pm

It’s very helpful. Thank you! I really appreciate your help!