About DataParallel

The DataParallel function distributes the batch_size evenly to each GPU, and then each GPU processes the allocated data, but in my program, I need to re-integrate an intermediate output variable, that is, firstly, aggregate this variable from all GPUs. Then use the Tensor.view() function to re-integrate the data, and finally distribute the integrated data to each GPU again. Every time I do this, the program will report an error. Is there a solution?

x = x.view(self.batch_size * self.num_segments, 3, 224, 224)
RuntimeError: shape ‘[12, 3, 224, 224]’ is invalid for input of size 602112

1 Like

Hi, what you can do is to use dataparallel twice inside the nn.Module like this:

class model(Module):
def init():
   self.subnet1 = DataParallel(net1).to(device)
   self.subnet2 = DataParallel(net2).to(device)
def forward():
  x = self.subnet1(x)
  # Here x is gathered from all gpus
  x = self.subnet2(x) # DataParallel again
  return x

Thank you! I know what you said. But in my program, the model is very big, and the x is also very big, the size of x is 32 * 32 * 96 * 28 * 28, type of x is torch.FloatTensor. If x is gathered to one GUP, the GPU would explode

If you let me know, what’s the point of gathering back everything?
I mean,
you have batches x segments x others in a per-gpu level

if you gather everything back you will get

N *batches x segments x others in a global level
when you reshape you will get

N * batches * segments x others

But you are gonna get same ordering once you re-send everything to gpus.

Another problem I see is that if you have not enough memory to allocate everything in the same device, how do you expect to perform reshaping?

lstm = torch.nn.LSTM(10, 20,1)
lstm.state_dict().keys()

Output result:

Out[47]: odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])

According to the calculation process of LSTM, there should be only one bias. Why do we output two bias variables, that is,‘bias_ih_l0’and’bias_hh_l0’?

Answered here.