Something wrong while using .cuda() in "forward" with Dataparallel: arguments are located on different GPUs

534441921 · October 21, 2020, 2:14am

The following is a part of my code:
It works well on a single GPU, but I need to use multi GPU, but I find something wrong while using .cuda() in “forward” with Dataparallel, even I use something like ‘epsilon = self.normal.sample(self.mu.size()).cuda(self.mu.device()) ’(still can’t send tensor to right GPU) or use register_buffer (This only works in originally init). I seriously need your help !!!

class Gaussian(object):
    def __init__(self, mu, rho):
        super().__init__()
        self.mu = mu
        self.rho = rho
        self.normal = torch.distributions.Normal(0, 1)

    @property
    def sigma(self):
        return torch.log1p(torch.exp(self.rho))

    def sample(self):
        epsilon = self.normal.sample(self.mu.size()).cuda()   # This is where the error happens !
        return self.mu + self.sigma * epsilon

class SharableLinear(nn.Module):
    """Modified linear layer."""
    __constants__ = ['bias', 'in_features', 'out_features']

    def __init__(self, in_features, out_features, bias=True, ratio=0.5):
        super(SharableLinear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features

        # weight and bias are no longer Parameters.
        self.weight = Parameter(torch.Tensor(out_features, in_features), requires_grad=True)
        nn.init.normal_(self.weight, 0, 0.01)
        if bias:
            self.bias = Parameter(torch.Tensor(out_features), requires_grad=True)
            nn.init.constant_(self.bias, 0)
        else:
            self.register_parameter('bias', None)

        fan_in, _ = _calculate_fan_in_and_fan_out(self.weight)

        total_var = 2 / fan_in
        noise_var = total_var * ratio
        mu_var = total_var - noise_var

        noise_std, mu_std = math.sqrt(noise_var), math.sqrt(mu_var)
        rho_init = np.log(np.exp(noise_std) - 1)

        self.weight_rho = nn.Parameter(torch.Tensor(out_features, 1).uniform_(rho_init, rho_init))

        self.weight_gaussian = Gaussian(self.weight, self.weight_rho)

    def forward(self, input, sample=False):
        if sample:
            weight = self.weight_gaussian.sample()   #  I have to reset weight inside forward, which means .cuda() have to be used
        else:
            weight = self.weight

        return F.linear(input, weight, self.bias)

albanD · October 21, 2020, 2:20pm

Hi,

The DataParallel is splitting your model to run on mutiple GPUs. So different copies of your model will be located on different GPUs.
But when you do .cuda() , this is the same as .cuda(0) and so all the copies that don’t live on the GPU 0 will have problems as you give them a Tensor on the wrong GPU.
You can replace it with: .to(self.mu.device) to be sure to always place it on the same device as the other Tensors for that copy.

534441921 · October 22, 2020, 6:42am

Hi,
Many thanks for your reply!
when I changed .cuda() to .cuda(self.mu.device) or .to(self.mu.device) It still raise RuntimeError: arguments are located on different GPUs.

        epsilon = self.normal.sample(self.mu.size()).to(self.mu.device)

Here are some details.

  File "/home/bzg/anaconda3/envs/torch1.2/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.



    return F.linear(input, weight, self.bias)
  File "/home/bzg/anaconda3/envs/torch1.2/lib/python3.7/site-packages/torch/nn/functional.py", line 1371, in linear
    output = input.matmul(weight.t())
RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathBlas.cu:260

534441921 · October 22, 2020, 6:53am

My closest try is to turn Gaussian to inherent nn.Module and turn sample to forward():

class SharableLinear(nn.Module):
    def forward(self, input, sample=False):
        if sample:
            weight = self.weight_gaussian.forward()
        else:
            weight = self.weight

class Gaussian(nn.Module):
    def __init__(self, mu, rho):
        super().__init__()
        self.mu = mu
        self.rho = rho
        self.normal = torch.distributions.Normal(0, 1)

    @property
    def sigma(self):
        return torch.log1p(torch.exp(self.rho))

    def forward(self):
        epsilon = self.normal.sample(self.mu.size()).cuda()
        return self.mu + 0.1 * self.sigma * epsilon

This time there is no error and the code can run. But a warning left :

This should still have a bad effect.Do you have any ideas?

534441921 · October 22, 2020, 7:07am

If it is the loss generate from different GPUs, I can simply do loss.mean(). But I have no idea to handle this problem.

albanD · October 22, 2020, 2:16pm

The warning seems to say that your forward returns a scalar which cannot be concatenated directly so they are made into 1D Tensor with 1 element and then concatenated.
This is fine

534441921 · October 23, 2020, 7:07am

Thank you for your reply! The existence of this warning still worries me, maybe I’ll just have to postpone that.

albanD · October 23, 2020, 2:44pm

You can call .view(1) or .unsqueeze(1) on your return value from the forward to get something that is 1D and silence the warning.

534441921 · October 24, 2020, 6:07am

Thanks again, but I need to return 2D tensor, as self.mu is 2D, sigma is 1D. Besides, I use self.sigma.expand(self.mu.size()) or sigma.unsqueeze(1) still not fix the warning. I even don’t know where this warning refers to. The changed code still works well on a single GPU, so the problem must be in Data Parallel, maybe I should learn more about its mechanism first.
The only information is

stark · December 20, 2022, 7:48am

@albanD Thank you for the answer.

Could .to(torch.cuda.current_device()) also be used for this purpose? It seems helpful in situations where a variable on the right GPU isn’t readily available.