The following is a part of my code:
It works well on a single GPU, but I need to use multi GPU, but I find something wrong while using .cuda() in “forward” with Dataparallel, even I use something like ‘epsilon = self.normal.sample(self.mu.size()).cuda(self.mu.device()) ’(still can’t send tensor to right GPU) or use register_buffer (This only works in originally init). I seriously need your help !!!
class Gaussian(object):
def __init__(self, mu, rho):
super().__init__()
self.mu = mu
self.rho = rho
self.normal = torch.distributions.Normal(0, 1)
@property
def sigma(self):
return torch.log1p(torch.exp(self.rho))
def sample(self):
epsilon = self.normal.sample(self.mu.size()).cuda() # This is where the error happens !
return self.mu + self.sigma * epsilon
class SharableLinear(nn.Module):
"""Modified linear layer."""
__constants__ = ['bias', 'in_features', 'out_features']
def __init__(self, in_features, out_features, bias=True, ratio=0.5):
super(SharableLinear, self).__init__()
self.in_features = in_features
self.out_features = out_features
# weight and bias are no longer Parameters.
self.weight = Parameter(torch.Tensor(out_features, in_features), requires_grad=True)
nn.init.normal_(self.weight, 0, 0.01)
if bias:
self.bias = Parameter(torch.Tensor(out_features), requires_grad=True)
nn.init.constant_(self.bias, 0)
else:
self.register_parameter('bias', None)
fan_in, _ = _calculate_fan_in_and_fan_out(self.weight)
total_var = 2 / fan_in
noise_var = total_var * ratio
mu_var = total_var - noise_var
noise_std, mu_std = math.sqrt(noise_var), math.sqrt(mu_var)
rho_init = np.log(np.exp(noise_std) - 1)
self.weight_rho = nn.Parameter(torch.Tensor(out_features, 1).uniform_(rho_init, rho_init))
self.weight_gaussian = Gaussian(self.weight, self.weight_rho)
def forward(self, input, sample=False):
if sample:
weight = self.weight_gaussian.sample() # I have to reset weight inside forward, which means .cuda() have to be used
else:
weight = self.weight
return F.linear(input, weight, self.bias)
The DataParallel is splitting your model to run on mutiple GPUs. So different copies of your model will be located on different GPUs.
But when you do .cuda() , this is the same as .cuda(0) and so all the copies that don’t live on the GPU 0 will have problems as you give them a Tensor on the wrong GPU.
You can replace it with: .to(self.mu.device) to be sure to always place it on the same device as the other Tensors for that copy.
Hi,
Many thanks for your reply!
when I changed .cuda() to .cuda(self.mu.device) or .to(self.mu.device) It still raise RuntimeError: arguments are located on different GPUs.
File "/home/bzg/anaconda3/envs/torch1.2/lib/python3.7/site-packages/torch/_utils.py", line 369, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 1 on device 1.
return F.linear(input, weight, self.bias)
File "/home/bzg/anaconda3/envs/torch1.2/lib/python3.7/site-packages/torch/nn/functional.py", line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: arguments are located on different GPUs at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/THC/generic/THCTensorMathBlas.cu:260
The warning seems to say that your forward returns a scalar which cannot be concatenated directly so they are made into 1D Tensor with 1 element and then concatenated.
This is fine
Thanks again, but I need to return 2D tensor, as self.mu is 2D, sigma is 1D. Besides, I use self.sigma.expand(self.mu.size()) or sigma.unsqueeze(1) still not fix the warning. I even don’t know where this warning refers to. The changed code still works well on a single GPU, so the problem must be in Data Parallel, maybe I should learn more about its mechanism first.
The only information is
Could .to(torch.cuda.current_device()) also be used for this purpose? It seems helpful in situations where a variable on the right GPU isn’t readily available.