The DataParallel function distributes the batch_size evenly to each GPU, and then each GPU processes the allocated data, but in my program, I need to re-integrate an intermediate output variable, that is, firstly， aggregate this variable from all GPUs. Then use the Tensor.view() function to re-integrate the data, and finally distribute the integrated data to each GPU again. Every time I do this, the program will report an error. Is there a solution?
x = x.view(self.batch_size * self.num_segments, 3, 224, 224)
RuntimeError: shape ‘[12, 3, 224, 224]’ is invalid for input of size 602112
Hi, what you can do is to use dataparallel twice inside the nn.Module like this:
self.subnet1 = DataParallel(net1).to(device)
self.subnet2 = DataParallel(net2).to(device)
x = self.subnet1(x)
# Here x is gathered from all gpus
x = self.subnet2(x) # DataParallel again
Thank you! I know what you said. But in my program, the model is very big, and the x is also very big, the size of x is 32 * 32 * 96 * 28 * 28, type of x is torch.FloatTensor. If x is gathered to one GUP, the GPU would explode
If you let me know, what’s the point of gathering back everything?
you have batches x segments x others in a per-gpu level
if you gather everything back you will get
N *batches x segments x others in a global level
when you reshape you will get
N * batches * segments x others
But you are gonna get same ordering once you re-send everything to gpus.
Another problem I see is that if you have not enough memory to allocate everything in the same device, how do you expect to perform reshaping?
lstm = torch.nn.LSTM(10, 20,1)
Out: odict_keys(['weight_ih_l0', 'weight_hh_l0', 'bias_ih_l0', 'bias_hh_l0'])
According to the calculation process of LSTM, there should be only one bias. Why do we output two bias variables, that is,‘bias_ih_l0’and’bias_hh_l0’?